I want to calculate yoy% change for month level data, taking into account that the last (current) period is partial.
This has already been asked: Resampling and calculating year over year with partial data but I cannot understand the answer that was given.
My dataframe is as follows:
import pandas as pd
import numpy as np
np.random.seed(555)
# Create a sample dataframe
df = pd.DataFrame({
'order_date': pd.date_range(start='2022-01-01', end='2024-07-10'),
'customers': np.random.randint(0, 100, size=(922, ))
})
df_monthly = df.resample('ME', on='order_date').sum().reset_index()
print(df_monthly.tail())
> order_date customers
26 202403 1358
27 202404 1581
28 202405 1584
29 202406 1456
30 202407 389
Now I calculate yoy % change for every month
yoy_change = df_monthly['customers'].pct_change(12).mul(100)
print(yoy_change.tail())
26 -6.215470
27 -1.801242
28 22.885958
29 7.772021
30 -78.460687
However the pandas resample sums the partial month of July 2024 (through the 10th) and compares it with the full month of last year July 2023 when the percentage change is calculated. This leaves it at a very negative number when that isn’t the reality (since we are comparing a full month to a partial one).
The number of customers for July 2023 “up to the 10th” was 513, therefore the yoy % for the month of July 2024 should be -24 not -78.