I have this table in SQL:
CREATE TABLE books_returned (
name VARCHAR(50),
book_id VARCHAR(50),
year_book_returned INT
);
INSERT INTO books_returned (name, book_id, year_book_returned) VALUES
('john', 'julius ceasar', 2010),
('john', 'julius caesar', 2010),
('john', 'hamlet', 2010),
('john', 'hamlet', 2010),
('john', 'othello', 2009),
('john', 'othello', 2009),
('kevin', 'macbeth', 2015),
('kevin', 'tempest', 2020),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'king lear', 2005);
name book_id year_book_returned
john julius ceasar 2010
john julius caesar 2010
john hamlet 2010
john hamlet 2010
john othello 2009
john othello 2009
kevin macbeth 2015
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
david king lear 2005
Problem: For each name, I want to keep all columns and all rows for the book with the largest year.
Since there are ties, I do not care which set of duplicates are picked. Both of the following options are OK for me:
# output 1: acceptable
name book_id year_book_returned
john hamlet 2010
john hamlet 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
#output 2: acceptable
name book_id year_book_returned
john julius ceasar 2010
john julius caesar 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
I tried this query that first finds the max year in a subquery and joins it back to the original table:
SELECT br.*
FROM books_returned br
JOIN (
SELECT name, MAX(year_book_returned) as max_year
FROM books_returned
GROUP BY name
) as subquery
ON br.name = subquery.name AND br.year_book_returned = subquery.max_year;
But this is incorrect: this shows all books for John :
name book_id year_book_returned
john julius ceasar 2010
john julius caesar 2010
john hamlet 2010
john hamlet 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
Can someone please show me how to do this correctly? I was thinking of using the partition row_number order by random()
statements to create a rank variable and then select the lowest rank?
Thanks!