I have a dataframe with multiple accounts across the last few years and am trying to get the rolling average of a column, per account, which is easy enough. However I also need to have the min_periods be variable. I have a column for the min_periods in the dataframe (minuploads) and the window is always 10. I just cannot find how to go about doing this. End of the day, I am trying to take a rolling average of the last 10 values, and each account falls into a different sort of category which requires a different minimum number of previous values which I have set as minuploads. If there are not enough previous values, I dont want the rolling average to calculate and will use a modeled value for those rows.
Here is a sample of the dataframe
accountID | diffs | minuploads |
---|---|---|
1001 | 7 | 8 |
1001 | 7 | 8 |
1001 | 1 | 8 |
1001 | 6 | 8 |
1002 | 2 | 10 |
1002 | 3 | 10 |
1002 | 3 | 10 |
1002 | 2 | 10 |
Initially I didnt use a variable min_periods and used this code for the rolling average:
df['ma'] = df.groupby('accountID')['diffs'].transform(lambda x: x.rolling(10,5).mean())
Which worked great, but then decided I need to do a variable min_period. I tried this but I get a key error
df['ma'] = df.groupby('accountID')['diffs'].transform(lambda x: x.rolling(10,x['minuploads']).mean())
Also tried this to make sure the groupby had both columns, but still get a KeyError: ‘minuploads’
df['ma'] = df.groupby('accountID')[['diffs','minuploads']].transform(lambda x: x.rolling(10,x['minuploads']).mean())
Nolan G is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.