I have these two tables:
CREATE TABLE my_table1 (
name VARCHAR(50),
var1 DATE,
var2 INT
);
INSERT INTO my_table1 (name, var1, var2) VALUES
('john', '2010-01-01', 94),
('john', '2010-01-04', 106),
('john', '2015-01-01', 99),
('alex', '2010-01-01', 96),
('alex', '2018-01-01', 96),
('sara', '2005-01-01', 94),
('sara', '2006-01-01', 90),
('tim', '1999-01-01', 101);
CREATE TABLE my_table2 (
name VARCHAR(50),
var3 DATE,
var4 CHAR(1)
);
INSERT INTO my_table2 (name, var3, var4) VALUES
('john', '2001-01-01', 'a'),
('john', '2002-01-01', 'b'),
('alex', '2021-01-01', 'c'),
('alex', '2022-01-01', 'd'),
('sara', '1999-01-01', 'e'),
('sara', '2023-01-01', 'f');
I am trying to answer this problem:
problem 1: for each name in my_table2, find the most recent row (based on date). join this row to my_table1. however, after the join – ensure that the date from my_table2 is greater than the date from my_table1 (if not, then remove)
For problem 1, I tried to solve the problem like this:
# problem 1
SELECT t1.*, t2.*
FROM my_table1 t1
JOIN (
SELECT name, var3, var4
FROM (
SELECT name, var3, var4,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY var3 DESC) as rn
FROM my_table2
) tmp
WHERE rn = 1
) t2
ON t1.name = t2.name
WHERE t1.var1 < t2.var3;
The code partly produces the correct output, but I can’t figure out how to include a NA row for John and Tim here
Can someone please show me how to do this correctly?
Thanks!