I have a CSV file from a measurement device, that produces a bunch of values (Temperature, Rain and Wind) and gives some metadata for the device:
Station, Hillside
ID, 12345
elevation, 54321
units, °C, mm, kph
time, temp, prec, wind
2024-08-01 00:00, 18, 0, 5
2024-08-01 01:00, 18, 0, 2
2024-08-01 02:00, 17.5, 3, 14
When I try to read this with df=pd.read_csv('12345.csv', header=[0,1,2,3,4], index_col=0, parse_dates=True)
I get an Header rows must have an equal number of columns
error. If I read it in with header=[3,4]
, where I have all columns, it works, but especially the Station
name would be rather important to have.
I can probably read that in separately and then add it to the df’s multiindex, but I wonder if I’m missing some obvious shortcut, since a lot of devices that produce csv files appear to add the metadata on top, without filling out all columns. Especially since I can have multiindex rows, that stretch over many columns.
Edit: Essentially, I’m looking for a dataframe like this:
Station Hillside |m i
ID 12345 |u n
elevation 54321 |l d
units °C mm kph |t e
time temp prec wind |i x
__________________________________________
2024-08-01 00:00:00 18.0 0 5 |d
2024-08-01 01:00:00 18.0 0 2 |a
2024-08-01 02:00:00 17.5 3 14 |t
... |a
2
Use names
instead of header
df = pd.read_csv('12345.csv', names=[0,1,2,3,4], index_col=0, parse_dates=True)
Output
1 2 3 4
0
Station Hillside NaN NaN NaN
ID 12345 NaN NaN NaN
elevation 54321 NaN NaN NaN
units °C mm kph NaN
time temp prec wind NaN
2024-08-01 00:00 18 0 5 NaN
2024-08-01 01:00 18 0 2 NaN
2024-08-01 02:00 17.5 3 14 NaN
3