I have a large dataframe -> lazyframe that was cut by the Expr.cut function. Now I would like to iterate through these categories, but I failed to find an effective way.
When I use group_by, it tells me it is not iterable (unless I collect it before the grouping). It raises:
TypeError: 'LazyGroupBy' object is not iterable
I really don’t want to filter the whole lazyframe for every category. How to achieve it?
The code I would love to be working (namely the for cycle in the end):
import polars as pl
import numpy as np
data = pl.LazyFrame(dict(
a = np.linspace(1,5,100),
b = np.linspace(20,30,100),
))
N_bins = 10
cutPoints = np.linspace(
data.select(pl.max("a")).collect(),
data.select(pl.min("a")).collect(),
N_bins
)
data = data.with_columns(
pl.col("a").cut(cutPoints)
)
groups = data.group_by("a")
for name, group in groups:
print( name)
print(group)