Consider the following pandas:
df1:
user_id create_at
0 1 2023-01-01
1 1 2023-01-02
2 2 2023-01-01
3 2 2023-01-03
4 2 2023-01-05
df2:
user_id withdrawal_request_date
0 1 2023-01-03
1 2 2023-01-02
2 3 2023-01-01
I want to do a merge_asof
BUT when I do, I lose the user with user_id=3
. He is not in df1 so I lose him as you see below:
pd.merge_asof(df1.sort_values('create_at'),
df2.sort_values('withdrawal_request_date'),
by = 'user_id',
left_on = 'create_at',
right_on = 'withdrawal_request_date',
direction = 'forward')
user_id create_at withdrawal_request_date
0 1 2023-01-01 2023-01-03
1 2 2023-01-01 2023-01-02
2 1 2023-01-02 2023-01-03
3 2 2023-01-03 NaT
4 2 2023-01-05 NaT
However, I would like to keep him in the final dataframe.
NOTE that my dataset is huge with thousands of users each doing a lot of activity so doing something like outer joins
will not work here.
Any ideas?
DATA
df1 = pd.DataFrame({
'user_id': [1, 1, 2, 2, 2],
'create_at': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-03', '2023-01-05'])
})
df2 = pd.DataFrame({
'user_id': [1, 2, 3],
'withdrawal_request_date': pd.to_datetime(['2023-01-03', '2023-01-02', '2023-01-01'])
})