Hi I have a dataframe like
┌─────────────────────┬───────────┬───────────┬───────────┬───────────┬──────┐
│ ts ┆ 646150 ┆ 646151 ┆ 646154 ┆ 646153 ┆ week │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ i8 │
╞═════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪══════╡
│ 2024-02-01 00:00:00 ┆ 24.490348 ┆ 65.088941 ┆ 53.545259 ┆ 13.499832 ┆ 5 │
│ 2024-02-01 01:00:00 ┆ 15.054187 ┆ 63.095247 ┆ 60.786479 ┆ 29.538156 ┆ 5 │
│ 2024-02-01 02:00:00 ┆ 24.54212 ┆ 63.880298 ┆ 57.535928 ┆ 24.840966 ┆ 5 │
│ 2024-02-01 03:00:00 ┆ 24.85621 ┆ 69.778516 ┆ 67.57284 ┆ 24.672476 ┆ 5 │
│ 2024-02-01 04:00:00 ┆ 21.21628 ┆ 61.137849 ┆ 55.231299 ┆ 16.648383 ┆ 5 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 2024-02-29 19:00:00 ┆ 23.17318 ┆ 62.590752 ┆ 72.026908 ┆ 24.614523 ┆ 9 │
│ 2024-02-29 20:00:00 ┆ 23.86416 ┆ 64.87102 ┆ 61.023656 ┆ 20.095353 ┆ 9 │
│ 2024-02-29 21:00:00 ┆ 18.553397 ┆ 67.530137 ┆ 63.477737 ┆ 17.313834 ┆ 9 │
│ 2024-02-29 22:00:00 ┆ 22.339175 ┆ 67.456563 ┆ 62.552035 ┆ 20.880844 ┆ 9 │
│ 2024-02-29 23:00:00 ┆ 15.5809 ┆ 66.774367 ┆ 57.066264 ┆ 29.529057 ┆ 9 │
└─────────────────────┴───────────┴───────────┴───────────┴───────────┴──────┘
this is generated with the code
def generate_test_data():
# Function to generate hourly timestamps for a month
def generate_hourly_timestamps(start_date, end_date):
current = start_date
while current <= end_date:
yield current
current += timedelta(hours=1)
# Define the date range
start_date = datetime(2024, 2, 1)
end_date = datetime(2024, 2, 29, 23, 0, 0) # February 29th 23:00 for a leap year
# Generate the data
timestamps = list(generate_hourly_timestamps(start_date, end_date))
num_hours = len(timestamps)
data = {
"ts": timestamps,
"646150": np.random.uniform(15, 25, num_hours), # Random temperature data between 15 and 25
"646151": np.random.uniform(60, 70, num_hours), # Random humidity data between 60 and 70
"646154": np.random.uniform(50, 75, num_hours), # Random sensor data between 50 and 75
"646153": np.random.uniform(10, 30, num_hours) # Random sensor data between 10 and 30
}
df = pl.DataFrame(data)
df = df.with_columns(pl.col("ts").cast(pl.Datetime))
return df
df = generate_test_data()
# Add a week column
df = df.with_columns((pl.col("ts").dt.week()).alias("week"))
print(df)
I would like to group by the week or other time intervals doing avg , min, max
I could do something like
# Group by week and calculate min, max, and avg
aggregated_df = df.groupby("week").agg([
pl.col("646150").min().alias("646150_min"),
pl.col("646150").max().alias("646150_max"),
pl.col("646150").mean().alias("646150_avg"),
pl.col("646151").min().alias("646151_min"),
pl.col("646151").max().alias("646151_max"),
pl.col("646151").mean().alias("646151_avg"),
pl.col("646154").min().alias("646154_min"),
pl.col("646154").max().alias("646154_max"),
pl.col("646154").mean().alias("646154_avg"),
pl.col("646153").min().alias("646153_min"),
pl.col("646153").max().alias("646153_max"),
pl.col("646153").mean().alias("646153_avg")
])
print(aggregated_df)
but I would like it to be dynamic without specify the col names.
I would like to generate the dataframe like below where the column value is a list or tuples or some other multiple value format that holds the min, max, avg values.
┌─────────────────────┬──────────────────┬──────────────────┐
│ week ┆ 646150 ┆ 646151 │
│ --- ┆ --- ┆ --- │
│ i8 ┆ [] ┆ [] │
╞═════════════════════╪══════════════════╪══════════════════╡
│ 5 ┆ [24.1,26.3,25.0] ┆ [22.1,23.3,22.5] │
Is this possible in polars ?
Thanks