I have historical data from 2012 to 2023. I am trying to calculate the average for every hour in every day over these years to build a ‘reference year’ or ‘baseline’. My DataFrame (final_df) looks like this:
Timestamp bruttoPower [kW] nettoPower [kW]
0 2012-01-01 00:00:00 1501.500000 375.375000
1 2012-01-01 01:00:00 1488.833333 372.208333
2 2012-01-01 02:00:00 1626.833333 406.708333
3 2012-01-01 03:00:00 1350.333333 337.583333
4 2012-01-01 04:00:00 1424.000000 356.000000
... ... ... ...
105187 2023-12-31 19:00:00 3301.758391 763.500000
105188 2023-12-31 20:00:00 2322.399977 592.500000
105189 2023-12-31 21:00:00 3393.225006 861.000000
105190 2023-12-31 22:00:00 3784.991643 952.000000
105191 2023-12-31 23:00:00 3661.591654 950.500000
105192 rows × 5 columns
I had already asked a similar question: Calculate mean values of the past x years for every month
But know I am trying to get the average of each hour on every day over the years (2012-2023).
With help of the answer I got on my last post and ChatGPT I tried this:
import calendar
import pandas as pd
baseline_df = (
final_df.groupby(final_df.pop("Timestamp").dt.date).mean().reset_index().rename(columns={"Timestamp":"Date"}))
baseline_df["Date"] = baseline_df["Date"].apply(lambda x: x.strftime("%Y-%m-%d"))
display(baseline_df)
and
import calendar
import pandas as pd
final_df['Date'] = final_df['Timestamp'].dt.date
final_df['Hour'] = final_df['Timestamp'].dt.hour
baseline_df = (final_df.groupby(['Date', 'Hour']).mean().reset_index())
baseline_df["Hour"] = baseline_df["Hour"].apply(lambda x: f"{x:02}")
display(baseline_df)
but it is not working as I would like. I am trying to get something like this:
Timestamp brutto Power [kW] netto Power [kW]
01-01 00:00:00 1501.500000 375.375000
01-01 01:00:00 1488.833333 372.208333
01-01 02:00:00 1626.833333 406.708333
01-01 03:00:00 1350.333333 1309.823333
... ... ...
31-12 22:00:00 3393.225006 861.000000
31-12 23:00:00 3784.991643 952.000000