I’m currently applying the for
loop below to a pandas dataframe with fields prob
, id
, and time
. How can I vectorize this?
for cutoff in np.sort(df['prob'].unique()):
# get subset of records where prob >= cutoff
sub = df[df['prob'] >= cutoff]
# from that subset, for each ID get the record with minimum prob
subsub = sub.loc[sub.groupby('id')['prob'].idxmin()]
# for those records, compute various statistics on the time field
times = subsub['time'].values
data.append([cutoff, np.quantile(times, 0.5), np.quantile(times, 0.9), (times <= 7).sum()/len(times)])
metrics = pd.DataFrame(data)