I have some a multi index df, with a month and then facilityIDs and a TotalSpend value for each facility. I’m trying to aggregate the TotalSpend across all facilities for a quarter, where they have data in all 3 months of the quarter.
In my example data, I tried getting a subset of April, May, and June from the df and then doing an inner join, but when I try that I get an error that it’s not a df, but a df that using df.loc[[date]] is giving me. I would basically like to check which facilityIDs show up in all 3 months of the quarter and only keep those values.
Code:
import pandas as pd
import datetime
def open_file(path, quarter_number, months):
df_raw = pd.DataFrame({'Date':["2024-04-01","2024-05-01","2024-06-01", "2024-06-01","2024-05-01","2023-04-01","2023-05-01","2023-06-01","2024-05-01","2024-06-01","2023-05-01","2023-06-01", "2023-04-01","2024-05-01","2024-06-01"],
'FacilityID': [1,1,1,1,1,1,1,1,2,2,2,2,3,4,4],
'TotalSpend': [100,110,120,50,70,90,100,110,150,140,120,60,90,190,150]
}).set_index('Date')
df = df_raw.groupby(['Date', 'FacilityID'])['TotalSpend'].sum()
# print(df)
cur_dates = []
prev_dates = []
for month in months:
cur_date = datetime.date(2024, month, 1)
prev_date = datetime.date(cur_date.year - 1, month, 1)
cur_dates.append(cur_date.strftime('%Y-%m-%d'))
prev_dates.append(prev_date.strftime('%Y-%m-%d'))
#this is where i'm having issues
cur_data =df.loc[[cur_dates[1]]].join(df.loc[[cur_dates[1]]], on='FacilityID' ,join = "inner")
prev_data = df.loc[prev_dates[0]:prev_dates[-1]]
# print(cur_data)
# print(prev_data)
if __name__ == "__main__":
change = open_file("path",2 ,[4,5,6])
print(change)