I have some existing Pandas code which calculates the mean of some timeseries data on a month period.
df
.groupby(
pandas.Grouper(
key='transaction_date',
freq='M',
)
)
.aggregate(
{
'transaction_date': 'first',
'price': 'mean'
}
)
The resulting data still maintains a large amount of variance. I would like to reduce this variance by applying a rolling mean operation over a period of 6 months.
I have not found a good solution to this.
- I tried replacing
groupby
withrolling
and using apandas.Grouper
. However therolling
API does not work withGrouper
objects. - I tried another code (below), however this failed with an exception.
raise ValueError("window must be an integer 0 or greater")
ValueError: window must be an integer 0 or greater
df_slow = (
df
.copy()
.set_index('transaction_date')
.rolling(
window=timedelta(days=1),
)
.aggregate(
{
'price': 'mean'
}
)
.rename(columns={'price': 'mean'})
)
df_slow['transaction_date'] = (
df
.copy()
.set_index('transaction_date')
.index
.to_series()
.rolling(
window=timedelta(days=1)
)
.min()
)
- Finally I tried this (below).
df_slow = (
df
.copy()
.groupby(
pandas.Grouper(
key='transaction_date',
freq='D',
)
)
.aggregate({'price': 'mean'})
)
df_slow['price_rolling'] = (
df ['price']
.rolling(
window=180, # 180 days
)
.aggregate('mean')
)
This works, but I would have thought there is a more straght forward solution to this problem.
Can anyone help me understand what the sensible approach is here?