When I’m aggregating data in polars (e.g. groupby
, rolling
, group_by_dynamic
and calling .agg(...)
) how can I get a specific nth position? (In Polars lingo: How can I select an element within an expression within aggregation?)
Here’s a real world example, peak and trough detection. When detecting a peak you’re looking for the value to move up then back down over a rolling window. Because it needs to go both up and down, the center of the rolling window needs to be checked.
Working code example:
def _detect_peak(x):
ema = x[0]
middle_loc = ema.len()//2
if ema[middle_loc] == ema.max():
return True
else:
return False
# Detect Peak, size is ~7mo before peak and ~7mo after peak, to clearly determine trend.
peak_df = ema_df.group_by_dynamic(col.Date, every=minimum_bar_size, period="14mo").agg(
Last_Date=col.Date.last(),
SPX=col.SPX.last(),
EMA_6mo=col.EMA_6mo.last(),
Test=col.Date.len()//2,
Peak_Date=pl.map_groups(col.Date, lambda x: x[0][x[0].len()//2]),
Peak=pl.map_groups(col.EMA_6mo, lambda x: _detect_peak(x)),
).drop("Date").rename({"Last_Date":"Date"}).unique(subset="Date").sort(by="Date") # Cut off last rows.
Note: Because this is using date
type instead of row count group_by_dynamic
is being used instead of rolling
. minimum_bar_size
is set to a single row’s worth of data, so this code for all intents and purposes works just like rolling
.
This works. I solved the problem by writing two map_groups
functions, which is slow.
I have two primary questions:
-
Is there a way to return multiple columns with a single
map_groups
call so I don’t have to callmap_groups
twice? Calling it once should in theory double the speed of the code. -
How can I access an exact index instead of being limited to
first()
andlast()
functions? I would thinkTest2=col.Date.list[30],
would work but instead it gives the error:
SchemaError: invalid series dtype: expected
List
, gotdate
Maybe there is a way to convert a date type back into an expression? If I do Test2=[col.Date.list[30]],
I get:
PanicException: called
Result::unwrap()
on anErr
value: InvalidOperation(ErrString(“list_builder
operation not supported for dtypeobject
“))
From the Polar’s documentation found here: https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.list.eval.html#polars.Expr.list.eval
Expression to run. Note that you can select an element with pl.first(), or pl.col()
So it should be supported? If I paste in their example code Test2=col.Date.list.eval(pl.element().rank()),
I get:
SchemaError: invalid series dtype: expected
List
, gotdate
Maybe it’s a bug in Polars?
3
Regarding the question in the title.
In general, pl.Expr.get
can be used to select values within an expression by index.
However, in a pl.DataFrame.rolling
context you’ll face the issue of varying window sizes, such that a specific index might be out-of-bounds. Currently, one can workaround this issue, but combining pl.Expr.first
/ pl.Expr.last
with a suitable pl.Expr.shift
operation.
import polars as pl
df = pl.DataFrame({"x": [2, 3, 5, 7, 11]}).with_row_index()
(
df
.rolling(index_column="index", period="3i")
.agg(
pl.col("x").alias("x"),
pl.col("x").first().name.suffix("_first"),
pl.col("x").shift(-1).first().name.suffix("_second"),
pl.col("x").last().name.suffix("_last"),
)
)
shape: (5, 5)
┌───────┬────────────┬─────────┬──────────┬────────┐
│ index ┆ x ┆ x_first ┆ x_second ┆ x_last │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ list[i64] ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪════════════╪═════════╪══════════╪════════╡
│ 0 ┆ [2] ┆ 2 ┆ null ┆ 2 │
│ 1 ┆ [2, 3] ┆ 2 ┆ 3 ┆ 3 │
│ 2 ┆ [2, 3, 5] ┆ 2 ┆ 3 ┆ 5 │
│ 3 ┆ [3, 5, 7] ┆ 3 ┆ 5 ┆ 7 │
│ 4 ┆ [5, 7, 11] ┆ 5 ┆ 7 ┆ 11 │
└───────┴────────────┴─────────┴──────────┴────────┘
Regarding your question on the multiple return values.
You can return multiple elements from a call to pl.Expr.map_elements
/ pl.Expr.map_groups
, by returning a dictionary of the values.
df.with_columns(
pl.col("x").map_elements(
lambda x: {"a": x+1, "b": x+2},
return_dtype=pl.Struct([
pl.Field("a", pl.Int64),
pl.Field("b", pl.Int64),
]),
).struct.field("*")
)
shape: (5, 4)
┌───────┬─────┬─────┬─────┐
│ index ┆ x ┆ a ┆ b │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╪═════╡
│ 0 ┆ 2 ┆ 3 ┆ 4 │
│ 1 ┆ 3 ┆ 4 ┆ 5 │
│ 2 ┆ 5 ┆ 6 ┆ 7 │
│ 3 ┆ 7 ┆ 8 ┆ 9 │
│ 4 ┆ 11 ┆ 12 ┆ 13 │
└───────┴─────┴─────┴─────┘