I have two pandas dataframes that looks like:
df1
records the students and their mock exam score and the mock exam date:
ID Mock_Date Student_ID Mock_score
1 14/3/2020 792 213
2 9/5/2020 792 437
3 17/8/2020 792 435
4 4/1/2022 14598 112312
5 29/12/2022 14350 4325
6 3/10/2019 621 523
7 12/8/2020 621 876
8 5/5/2022 621 4324
9 6/9/2022 621 5432
10 6/3/2022 455 34
df2
records the students and their actual exam score and the exam date:
Student_ID Date Score
324 14/2/2019 543
792 14/2/2019 9785
792 3/11/2019 7690
621 3/11/2019 324
12 16/3/2020 34234
792 16/3/2020 4235
14598 16/3/2020 975
792 9/5/2020 427
792 17/8/2020 876
621 17/8/2020 986
And I want to merge df1
with df2
using the following logic: for a particular row in df2
(the actual exam score of a particular student), use the row from df1
with mock exam date just before the actual exam date (i.e. the closest date before the actual exam date), and if it doesn’t exist, then put NaN. So the desired output looks like:
Student_ID Date Score Mock_Date Mock_score
324 14/2/2019 543 NaN NaN
792 14/2/2019 9785 NaN NaN
792 3/11/2019 7690 NaN NaN
621 3/11/2019 324 3/10/2019 523 #last occurrence before 3/11 is 3/10
12 16/3/2020 34234 NaN NaN
792 16/3/2020 4235 14/3/2020 213 #last occurrence before 16/3 is 14/3
14598 16/3/2020 975 NaN NaN
792 9/5/2020 427 14/3/2020 213 #last occurrence before 9/5 is 14/3
792 17/8/2020 876 9/5/2020 437 #last occurrence before 17/8 is 9/5
621 17/8/2020 986 12/8/2020 876
I have no idea how to start even, thanks in advance.