I have an Excel file that looks like this:
In cell A1, the user can select an option and the selection B1:E3 is used as a lookup table to fill cells A2:A3. I want to read the lookup table to python using pandas.read_excel()
. However, when I use
df = pd.read_excel("workbook.xlsx", sheet_name="Sheet1", header=0, usecols="B:E", numrows=2)
one column name is registered twice and the second one gets augmented with a “.1” suffix. Since pandas 2.x
, setting mangle_dup_cols=False
is not recognized anymore. Unfortunately, I cannot change the format of the Excel sheet and would prefer not to downgrade pandas
.
As a workaround, I currently read the table to a DataFrame
with header=None
and then assign row index 0 as the column names and drop the first row. This is not a very elegant solution and I was hoping there is a better way of doing this.
0
I cannot find a direct replacement in the docs, but I think a simpler workaround would be to not user header=0
and instead use a regex to clean up the column names:
df = pd.read_csv()
df.columns = df.columns.str.replace(r"(.+).d+$", r"1", regex=True)