My issue is simple,
I have an excel file containing multiple sheets, in every sheet there is a dataframe the dataframes on the different sheets have different columns but they all have one common column. The common column is called ‘Supervisor’ and has about 17 unique values.
What I want is for every Supervisor name, filter all rows in each sheet (every sheet has one dataframe or table with multiple columns and Supervisor is one of them) and keep only rows that have that Supervisor name. Then save the result in a new Excel file.
Repeating this process for all Supervisor names.
I have the code below (I applied it for one supervisor) which is half working since it is filtering. However in the result I find only one sheet while what I want is if there is 12 sheets in the original excel file this results in 12 filtered sheets in the new Excel file.
Here is the code :
import pandas as pd
excel_file_path = "filename.xlsx"
filter_value = "some_name"
dataframes = pd.read_excel('filename.xlsx', engine="openpyxl", sheet_name=None)
for sheet_name, dataframe in dataframes.items():
try:
filtered_df = dataframe[dataframe["Supervisor"] == filter_value]
filtered_sheet_name = f"{sheet_name}_filtered_{id(filtered_df)}"
filtered_df.to_excel(excel_file_path, sheet_name=filtered_sheet_name, index=False)
except KeyError:
print(f"Column Supervisor not found in sheet {sheet_name}. Skipping to next sheet.")
I tried the code above and expected to have a new Excel file with multiple filtered sheets.
4