Thiết kế website giá rẻ

Question

I was trying to slice a weight Series based on the length of the rolling window. Use cases can be e.g. we want to apply some custom weights for rolling average. (I understand there’s already a rolling_mean implemented.)

Could someone help explain why I have the error below and what is the proper way to achieve what I described above? Thanks!

Let’s say my rolling window has a length of 5. The first 4 rows will have a length of 1, 2, 3, 4 and therefore I only want to get the first few weights. From the fifth row on, I expect the weights to be always [.1, .2, .3, .4, .5].

The following code works:

pl.DataFrame({
    't_idx': [1 ,2 ,3, 4, 5],
}).rolling('t_idx', period='5i').agg(
    -pl.col('t_idx').len().cast(pl.Int64).alias('start'),
    pl.col('t_idx').count().alias('end'),
    pl.lit(pl.Series([.1, .2, .3, .4, .5] * 5)).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('weights')
)

shape: (5, 4)
┌───────┬───────┬─────┬───────────────────┐
│ t_idx ┆ start ┆ end ┆ new               │
│ ---   ┆ ---   ┆ --- ┆ ---               │
│ i64   ┆ i64   ┆ u32 ┆ list[f64]         │
╞═══════╪═══════╪═════╪═══════════════════╡
│ 1     ┆ -1    ┆ 1   ┆ [0.1]             │
│ 2     ┆ -2    ┆ 2   ┆ [0.1, 0.2]        │
│ 3     ┆ -3    ┆ 3   ┆ [0.1, 0.2, 0.3]   │
│ 4     ┆ -4    ┆ 4   ┆ [0.1, 0.2, … 0.4] │
│ 5     ┆ -5    ┆ 5   ┆ [0.1, 0.2, … 0.5] │
└───────┴───────┴─────┴───────────────────┘

However, once I extend the number of rows to 6, I get some error which I don’t fully understand.

pl.DataFrame({
    't_idx': [1 ,2 ,3, 4, 5, 6],
}).rolling('t_idx', period='5i').agg(
    -pl.col('t_idx').len().cast(pl.Int64).alias('start'),
    pl.col('t_idx').count().alias('end'),
    pl.lit(pl.Series([.1, .2, .3, .4, .5] * 5)).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('weights')
)

thread 'polars-4' panicked at crates/polars-core/src/frame/group_by/aggregations/agg_list.rs:109:58:
range end index 6 out of range for slice of length 5
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[828], line 3
      1 pl.DataFrame({
      2     't_idx': [1 ,2 ,3, 4, 5, 6],
----> 3 }).rolling('t_idx', period='5i').agg(
      4     -pl.col('t_idx').len().cast(pl.Int64).alias('start'),
      5     pl.col('t_idx').count().alias('end'),
      6     pl.lit(pl.Series([.1, .2, .3, .4, .5])).slice(-pl.col('t_idx').len().cast(pl.Int64), pl.col('t_idx').count()).alias('new')
      7 )

File ~/virtual_environments/vve_3_11_6/lib/python3.11/site-packages/polars/dataframe/group_by.py:896, in RollingGroupBy.agg(self, *aggs, **named_aggs)
    868 def agg(
    869     self,
    870     *aggs: IntoExpr | Iterable[IntoExpr],
    871     **named_aggs: IntoExpr,
    872 ) -> DataFrame:
    873     """
    874     Compute aggregations for each group of a group by operation.
    875 
   (...)
    884         The resulting columns will be renamed to the keyword used.
    885     """
    886     return (
    887         self.df.lazy()
    888         .rolling(
    889             index_column=self.time_column,
    890             period=self.period,
    891             offset=self.offset,
    892             closed=self.closed,
    893             group_by=self.group_by,
    894         )
    895         .agg(*aggs, **named_aggs)
--> 896         .collect(no_optimization=True)
    897     )

File ~/virtual_environments/vve_3_11_6/lib/python3.11/site-packages/polars/lazyframe/frame.py:1967, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1964 # Only for testing purposes atm.
   1965 callback = _kwargs.get("post_opt_callback")
-> 1967 return wrap_df(ldf.collect(callback))

PanicException: range end index 6 out of range for slice of length 5

Tried doing slicing without rolling window and it was fine even when length=6 is greater than the length of the Series:

pl.select(
    pl.lit(pl.Series([.1, .2, .3, .4, .5])).slice(-5, 6)
)

shape: (5, 1)
┌─────┐
│     │
│ --- │
│ f64 │
╞═════╡
│ 0.1 │
│ 0.2 │
│ 0.3 │
│ 0.4 │
│ 0.5 │
└─────┘

Thiết kế website giá rẻ

Danh mục

Using Slice in Polars Rolling with Group_by