I have an excel file with raw data, which lists transaction data by month for facilities. There’s the month/year, the facilityID, and then spend in that transaction. There can be multiple transactions for a facility in a month. I’ve managed to group the transactions by date and facilityID, with the total spend as the values. It looks something like this.
I’m trying to calculate the aggregate year-over-year change in TotalSpend, for say 2024-05-01 (May 2024). Some of the facilities may be new so they wouldn’t have May 2023 data, or they might have dropped out or not yet reported the data so they wouldn’t have May 2024 data. In both cases I want to exclude those facilities then from the calculation. I would also like to have a Quarterly year-over-year change, but I’m assuming I can just do the same thing I do for the May, but for April, May, and June when the data is available.
This is what I’ve tried so far, but I’m getting an error. I don’t necessarily need to add it into the column in the df, just
May 2024 5%
would suffice for my purposes
Code:
import pandas as pd
import datetime
def open_file(path):
df_raw = pd.read_excel(*Path*, "Data")
df = df_raw.groupby(['Date','FacilityID'])['TotalSpend'].sum()
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['YearPrior'] = df['Year'] - 1
filtered_df= df[df['FacilityID'].isin(df.groupby(['Year','Month'])['FacilityID'].apply(set).loc[df['YearPrior']].explode())]
pivot_df= filtered_df.pivot(index='FacilityID', columns = 'Date', values = 'Value')
pivot_df['YOY Change'] = pivot_df[('2024-05-01', '2024-05-01')] - pivot_df[('2023-05-01', '2023-05-01')]