I’m having trouble with specifying a lambda function. I would like to have something like the lambda below, but not quite. The code should compare a rejected_time with any paid_out_time within the group and return True, if a rejected_time occurs within 5 minutes after any paid_out_time.
f = lambda x: ((x['rejected_time'].dropna() - x['paid_out_time'].dropna()).between(pd.Timedelta(0), pd.Timedelta(minutes=5)))
Using x['paid_out_time'].min()
generates about 100k True values, but dropping .min()
results in dramatic reduction. I can’t figure out how to use all paid_out_times in the comparison against a row-wise rejected_time and see if the rejected time occurs 0-5minutes after the paid_out_time.
I’ve been testing this code:
cols = ['paid_out_time', 'rejected_time']
df[cols] = df[cols].apply(pd.to_datetime, errors='coerce')
f = lambda x: ((x['rejected_time'].dropna() - x['paid_out_time'].dropna().min()).between(pd.Timedelta(0), pd.Timedelta(minutes=5)))
df['paid_out_auto_rejection'] = df.groupby('personal_id', group_keys=False).apply(f).astype(int)
Here’s some test data:
personal_id | application_id | rejected_time | paid_out_time |
---|---|---|---|
26A | 1ab | 2022-09-12 09:20:40.592 | NaT |
26A | 1ab | 2022-08-23 07:40:03.447463 | NaT |
26A | 1ab | 2022-08-02 23:16:59.545392 | NaT |
26A | 1ab | 2022-08-02 23:16:59.545392 | NaT |
26A | 1ab | 2022-09-12 09:20:40.592000 | 2022-08-02 23:16:59.545392 |
26A | 1ab | 2022-09-02 18:33:42.226000 | NaT |
26A | 8f0 | 2022-09-12 09:20:40.592000 | NaT |
26A | 8f0 | 2022-09-12 09:20:40.592000 | NaT |
26A | 8f0 | NaT | 2022-09-12 09:20:40.592 |
26A | 8f0 | 2022-09-12 09:21:08.604000 | NaT |
26A | 8f0 | 2022-09-22 08:27:45.693060 | NaT |