I was trying to slice a weight Series based on the length of the rolling window. Use cases can be e.g. we want to apply some custom weights for rolling average. (I understand there’s already a rolling_mean implemented.)
Could someone help explain why I have the error below and what is the proper way to achieve what I described above? Thanks!
Let’s say my rolling window has a length of 5. The first 4 rows will have a length of 1, 2, 3, 4 and therefore I only want to get the first few weights. From the fifth row on, I expect the weights to be always [.1, .2, .3, .4, .5].
The following code works:
pl.DataFrame({
't_idx': [1 ,2 ,3, 4, 5],
}).rolling('t_idx', period='5i').agg(
-pl.col('t_idx').len().cast(pl.Int64).alias('start'),
pl.col('t_idx').count().alias('end'),
pl.lit(pl.Series([.1, .2, .3, .4, .5] * 5)).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('weights')
)
shape: (5, 4)
┌───────┬───────┬─────┬───────────────────┐
│ t_idx ┆ start ┆ end ┆ new │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ u32 ┆ list[f64] │
╞═══════╪═══════╪═════╪═══════════════════╡
│ 1 ┆ -1 ┆ 1 ┆ [0.1] │
│ 2 ┆ -2 ┆ 2 ┆ [0.1, 0.2] │
│ 3 ┆ -3 ┆ 3 ┆ [0.1, 0.2, 0.3] │
│ 4 ┆ -4 ┆ 4 ┆ [0.1, 0.2, … 0.4] │
│ 5 ┆ -5 ┆ 5 ┆ [0.1, 0.2, … 0.5] │
└───────┴───────┴─────┴───────────────────┘
However, once I extend the number of rows to 6, I get some error which I don’t fully understand.
pl.DataFrame({
't_idx': [1 ,2 ,3, 4, 5, 6],
}).rolling('t_idx', period='5i').agg(
-pl.col('t_idx').len().cast(pl.Int64).alias('start'),
pl.col('t_idx').count().alias('end'),
pl.lit(pl.Series([.1, .2, .3, .4, .5] * 5)).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('weights')
)
thread 'polars-4' panicked at crates/polars-core/src/frame/group_by/aggregations/agg_list.rs:109:58:
range end index 6 out of range for slice of length 5
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[828], line 3
1 pl.DataFrame({
2 't_idx': [1 ,2 ,3, 4, 5, 6],
----> 3 }).rolling('t_idx', period='5i').agg(
4 -pl.col('t_idx').len().cast(pl.Int64).alias('start'),
5 pl.col('t_idx').count().alias('end'),
6 pl.lit(pl.Series([.1, .2, .3, .4, .5])).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('new')
7 )
File ~/virtual_environments/vve_3_11_6/lib/python3.11/site-packages/polars/dataframe/group_by.py:896, in RollingGroupBy.agg(self, *aggs, **named_aggs)
868 def agg(
869 self,
870 *aggs: IntoExpr | Iterable[IntoExpr],
871 **named_aggs: IntoExpr,
872 ) -> DataFrame:
873 """
874 Compute aggregations for each group of a group by operation.
875
(...)
884 The resulting columns will be renamed to the keyword used.
885 """
886 return (
887 self.df.lazy()
888 .rolling(
889 index_column=self.time_column,
890 period=self.period,
891 offset=self.offset,
892 closed=self.closed,
893 group_by=self.group_by,
894 )
895 .agg(*aggs, **named_aggs)
--> 896 .collect(no_optimization=True)
897 )
File ~/virtual_environments/vve_3_11_6/lib/python3.11/site-packages/polars/lazyframe/frame.py:1967, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
1964 # Only for testing purposes atm.
1965 callback = _kwargs.get("post_opt_callback")
-> 1967 return wrap_df(ldf.collect(callback))
PanicException: range end index 6 out of range for slice of length 5
Tried doing slicing without rolling window and it was fine even when length=6 is greater than the length of the Series:
pl.select(
pl.lit(pl.Series([.1, .2, .3, .4, .5])).slice(-5, 6)
)
shape: (5, 1)
┌─────┐
│ │
│ --- │
│ f64 │
╞═════╡
│ 0.1 │
│ 0.2 │
│ 0.3 │
│ 0.4 │
│ 0.5 │
└─────┘
jackaixin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.