I want to a groupby with a condition and then feed back the result to the original dataframe. In this case feature ‘COl_COND’ could either be 1 or 0, and the feature to be summarized is ‘AMMOUNT’.
Below this is done by performing two groupby’s, storing the result in pandas series and then merge back to the original dataframe.
Can this be done without doing a merge?
If memory usage is an issue, what method would be the most rational to apply?
df = pd.DataFrame({'ID':[1,1,2,2,3,3,3,4,5,5,6],
'COL_COND':[1,0,1,0,1,0,1,0,1,0,0],
'AMOUNT':[5, 80,100, 50, 100, 100, 20, 1, 51, 11, 12]})
series1 = df[df.COL_COND==1].groupby('ID')['AMOUNT'].sum().rename('sum_amount_1')
series0 = df[df.COL_COND==0].groupby('ID')['AMOUNT'].sum().rename('sum_amount_0')
df = df.merge(series1.to_frame().reset_index(), on='ID', how='left')
.merge(series0.to_frame().reset_index(), on='ID', how='left')
print(df)
ID COL_COND AMOUNT sum_amount_1 sum_amount_0
0 1 1 5 5.0000000 80
1 1 0 80 5.0000000 80
2 2 1 100 100.0000000 50
3 2 0 50 100.0000000 50
4 3 1 100 120.0000000 100
5 3 0 100 120.0000000 100
6 3 1 20 120.0000000 100
7 4 0 1 NaN 1
8 5 1 51 51.0000000 11
9 5 0 11 51.0000000 11
10 6 0 12 NaN 12