I am working with a Polars DataFrame and need to compute a cumulative sum of data values that resets whenever the cumulative sum exceeds a specific threshold, (absolute value). Here is an example of the DataFrame structure and the expected output:
┌───────┬────────────┬──────────┐
│ flag ┆ data_value ┆ cum_sum │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪════════════╪══════════╡
│ -1 ┆ 10 ┆ -10 │
│ 1 ┆ 15 ┆ 5 │
│ 1 ┆ 20 ┆ 25 │
│ -1 ┆ 30 ┆ -5 │
│ 1 ┆ 10 ┆ 5 │
│ 1 ┆ 30 ┆ 35 │ <-- Reset because abs(35) >= 30
│ -1 ┆ 10 ┆ -10 │ <-- New sum starts after reset
│ 1 ┆ 7 ┆ 3 │
└───────┴────────────┴──────────┘
In this table, flag
and data_value
are used to calculate cum_sum
. The cumulative sum should reset if the absolute value of cum_sum reaches or exceeds 30 in this case.
I am looking for a vectorized way or a function in Polars to handle this conditional reset efficiently without having to loop through the rows, if possible at all.