Using polars .rolling and .agg, how do I get the original column back, without having to join back with the original column, or without having to use .over ?
Example:
import polars as pl
dates = [
"2020-01-01 13:45:48",
"2020-01-01 16:42:13",
"2020-01-01 16:45:09",
"2020-01-02 18:12:48",
"2020-01-03 19:45:32",
"2020-01-08 23:16:43",
]
df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_columns(
pl.col("dt").str.strptime(pl.Datetime).set_sorted()
)
Provides me with a small polars dataframe:
dt | a | |
---|---|---|
0 | 2020-01-01 13:45:48 | 3 |
1 | 2020-01-01 16:42:13 | 7 |
2 | 2020-01-01 16:45:09 | 5 |
3 | 2020-01-02 18:12:48 | 9 |
4 | 2020-01-03 19:45:32 | 2 |
5 | 2020-01-08 23:16:43 | 1 |
When I apply a rolling aggregations, I get the new columns back, but not the original columns:
out = df.rolling(index_column="dt", period="2d").agg(
[
pl.sum("a").alias("sum_a"),
pl.min("a").alias("min_a"),
pl.max("a").alias("max_a"),
pl.col('a')
]
)
which gives:
dt | sum_a | min_a | max_a | a | |
---|---|---|---|---|---|
0 | 2020-01-01 13:45:48 | 3 | 3 | 3 | [3] |
1 | 2020-01-01 16:42:13 | 10 | 3 | 7 | [3 7] |
2 | 2020-01-01 16:45:09 | 15 | 3 | 7 | [3 7 5] |
3 | 2020-01-02 18:12:48 | 24 | 3 | 9 | [3 7 5 9] |
4 | 2020-01-03 19:45:32 | 11 | 2 | 9 | [9 2] |
5 | 2020-01-08 23:16:43 | 1 | 1 | 1 | [1] |
How can I get the original a column. I don’t want to join and I don’t want to use .over as I need the group_by of the rolling later on and .over does not work with .rolling