Thiết kế website giá rẻ

Question

Hi I have a dataframe like

┌─────────────────────┬───────────┬───────────┬───────────┬───────────┬──────┐
│ ts                  ┆ 646150    ┆ 646151    ┆ 646154    ┆ 646153    ┆ week │
│ ---                 ┆ ---       ┆ ---       ┆ ---       ┆ ---       ┆ ---  │
│ datetime[μs]        ┆ f64       ┆ f64       ┆ f64       ┆ f64       ┆ i8   │
╞═════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪══════╡
│ 2024-02-01 00:00:00 ┆ 24.490348 ┆ 65.088941 ┆ 53.545259 ┆ 13.499832 ┆ 5    │
│ 2024-02-01 01:00:00 ┆ 15.054187 ┆ 63.095247 ┆ 60.786479 ┆ 29.538156 ┆ 5    │
│ 2024-02-01 02:00:00 ┆ 24.54212  ┆ 63.880298 ┆ 57.535928 ┆ 24.840966 ┆ 5    │
│ 2024-02-01 03:00:00 ┆ 24.85621  ┆ 69.778516 ┆ 67.57284  ┆ 24.672476 ┆ 5    │
│ 2024-02-01 04:00:00 ┆ 21.21628  ┆ 61.137849 ┆ 55.231299 ┆ 16.648383 ┆ 5    │
│ …                   ┆ …         ┆ …         ┆ …         ┆ …         ┆ …    │
│ 2024-02-29 19:00:00 ┆ 23.17318  ┆ 62.590752 ┆ 72.026908 ┆ 24.614523 ┆ 9    │
│ 2024-02-29 20:00:00 ┆ 23.86416  ┆ 64.87102  ┆ 61.023656 ┆ 20.095353 ┆ 9    │
│ 2024-02-29 21:00:00 ┆ 18.553397 ┆ 67.530137 ┆ 63.477737 ┆ 17.313834 ┆ 9    │
│ 2024-02-29 22:00:00 ┆ 22.339175 ┆ 67.456563 ┆ 62.552035 ┆ 20.880844 ┆ 9    │
│ 2024-02-29 23:00:00 ┆ 15.5809   ┆ 66.774367 ┆ 57.066264 ┆ 29.529057 ┆ 9    │
└─────────────────────┴───────────┴───────────┴───────────┴───────────┴──────┘

this is generated with the code

def generate_test_data():
        # Function to generate hourly timestamps for a month
        def generate_hourly_timestamps(start_date, end_date):
            current = start_date
            while current <= end_date:
                yield current
                current += timedelta(hours=1)

        # Define the date range
        start_date = datetime(2024, 2, 1)
        end_date = datetime(2024, 2, 29, 23, 0, 0)  # February 29th 23:00 for a leap year

        # Generate the data
        timestamps = list(generate_hourly_timestamps(start_date, end_date))
        num_hours = len(timestamps)
        data = {
            "ts": timestamps,
            "646150": np.random.uniform(15, 25, num_hours),  # Random temperature data between 15 and 25
            "646151": np.random.uniform(60, 70, num_hours),  # Random humidity data between 60 and 70
            "646154": np.random.uniform(50, 75, num_hours),  # Random sensor data between 50 and 75
            "646153": np.random.uniform(10, 30, num_hours)   # Random sensor data between 10 and 30
        }

        df = pl.DataFrame(data)
        df = df.with_columns(pl.col("ts").cast(pl.Datetime))
        return df


    df = generate_test_data()
    
    
    # Add a week column
    df = df.with_columns((pl.col("ts").dt.week()).alias("week"))

    print(df)

I would like to group by the week or other time intervals doing avg , min, max

I could do something like

# Group by week and calculate min, max, and avg
aggregated_df = df.groupby("week").agg([
    pl.col("646150").min().alias("646150_min"),
    pl.col("646150").max().alias("646150_max"),
    pl.col("646150").mean().alias("646150_avg"),
    pl.col("646151").min().alias("646151_min"),
    pl.col("646151").max().alias("646151_max"),
    pl.col("646151").mean().alias("646151_avg"),
    pl.col("646154").min().alias("646154_min"),
    pl.col("646154").max().alias("646154_max"),
    pl.col("646154").mean().alias("646154_avg"),
    pl.col("646153").min().alias("646153_min"),
    pl.col("646153").max().alias("646153_max"),
    pl.col("646153").mean().alias("646153_avg")
])

print(aggregated_df)

but I would like it to be dynamic without specify the col names.

I would like to generate the dataframe like below where the column value is a list or tuples or some other multiple value format that holds the min, max, avg values.

┌─────────────────────┬──────────────────┬──────────────────┐
│ week                ┆ 646150           ┆ 646151           │
│ ---                 ┆ ---              ┆ ---              │
│ i8                  ┆ []               ┆ []               │
╞═════════════════════╪══════════════════╪══════════════════╡
│ 5                   ┆ [24.1,26.3,25.0] ┆ [22.1,23.3,22.5] │

Is this possible in polars ?

Thanks

Thiết kế website giá rẻ

Danh mục

Perform aggregation using min,max,avg on all columns