In Polars (1.0.0) we can use len() after group_by_dynamic() in lazy mode but not in eager mode:
from datetime import datetime
import polars as pl
df = pl.DataFrame(
{
"time": pl.datetime_range(
start=datetime(2024, 6, 30),
end=datetime(2024, 7, 16),
interval="1d",
eager=True,
),
}
).with_columns(n=pl.int_range(pl.len()))
# len() in lazy mode: OK
x = df.lazy().group_by_dynamic("time", every="1w").len(name="count").collect()
print(x)
# shape: (4, 2)
# ┌─────────────────────┬───────┐
# │ time ┆ count │
# │ --- ┆ --- │
# │ datetime[μs] ┆ u32 │
# ╞═════════════════════╪═══════╡
# │ 2024-06-24 00:00:00 ┆ 1 │
# │ 2024-07-01 00:00:00 ┆ 7 │
# │ 2024-07-08 00:00:00 ┆ 7 │
# │ 2024-07-15 00:00:00 ┆ 2 │
# └─────────────────────┴───────┘
# len() in eager mode: ERROR
x = df.group_by_dynamic("time", every="1w").len(name="count")
print(x)
# Traceback (most recent call last):
# File "/Users/anto/src/poste/sda-poste-logistics/script/polars_bug_groupby_dynamic_len.py", line 26, in <module>
# x = df.group_by_dynamic("time", every="1w").len(name="count")
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# AttributeError: 'DynamicGroupBy' object has no attribute 'len'
Why is it happening? It looks very odd!
For an eager df
I need to resort to:
# OK
x = df.group_by_dynamic("time", every="1w").agg(counts=pl.len())
print(x)