I am bringing in the data from an Excel spreadsheet.
I want to make all the info from df.row(8)
into the column header names.
In pandas it was just:
c = [ 'A', 'B', 'C', 'D', 'E', 'F' ]
df.columns = c
Using the rename does not seem very practical here. Is there an easier way?
3
You can rename your columns by assigning df.columns
in polars, and you can use pl.DataFrame.row()
to get row by index:
df = pl.DataFrame({
"a":list("cdf"),
"b":list("abc")
})
df.columns = df.row(2)
shape: (3, 2)
┌─────┬─────┐
│ f ┆ c │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ c ┆ a │
│ d ┆ b │
│ f ┆ c │
└─────┴─────┘
Alternatively, if you have no meaningful rows before row 8, it should by default put values from this row into column names (due to drop_empty_rows
and has_header
parameters being True
by default), although the row itself is not going to be present in the data.
df = pl.DataFrame({
"a":[None]*3 + list("cdf"),
"b":[None]*3 + list("abc")
})
# shape: (6, 2)
# ┌──────┬──────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞══════╪══════╡
# │ null ┆ null │
# │ null ┆ null │
# │ null ┆ null │
# │ c ┆ a │
# │ d ┆ b │
# │ f ┆ c │
# └──────┴──────┘
df.write_excel("test.xlsx", include_header=False)
pl.read_excel("test.xlsx")
shape: (2, 2)
┌─────┬─────┐
│ c ┆ a │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ d ┆ b │
│ f ┆ c │
└─────┴─────┘
update if you want to rename only some columns and skip None
values, you could use pl.DataFrame.to_dicts()
and pl.DataFrame.rename()
:
df = pl.DataFrame({
"a":["new_a","c","d"],
"b":[None,"g","h"]
})
# shape: (3, 2)
# ┌───────┬──────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞═══════╪══════╡
# │ new_a ┆ null │
# │ c ┆ g │
# │ d ┆ h │
# └───────┴──────┘
N = 0
d = df.slice(N, 1).to_dicts()[0] # take row index N and convert it to dict
d = {k:v for k,v in d.items() if v} # remove None values from dict
df.rename(d)
shape: (3, 2)
┌───────┬──────┐
│ new_a ┆ b │
│ --- ┆ --- │
│ str ┆ str │
╞═══════╪══════╡
│ new_a ┆ null │
│ c ┆ g │
│ d ┆ h │
└───────┴──────┘