I’m manipulating data from an Excel data sheet 87MB large using openpyxl and pandas but each one takes 10-30 minutes long.
I’m doing operations such as: deleting rows/columns, working out and printing mean, multiplying same cell in different sheets and outputting on a different sheet.
Example (took 20 minutes) :
book = 'file path'
sheet1 = '1' # Replace with your first sheet name
sheet2 = '2' # Replace with your second sheet name
# Read the sheets into DataFrames
df1 = pd.read_excel(book, sheet_name=sheet1)
df2 = pd.read_excel(book, sheet_name=sheet2)
# Check if dimensions match
if df1.shape != df2.shape:
raise ValueError("Sheets have different shapes")
I need to be able to read and write at the same time.
My end result nor does my operation need to be done in an .xlsx file.
1