After I update Polars from 26 to 29, I found that there is an odd result from the code I wrote before (using .list.sum()
). The following are trying to demo this issue:
import polars as pl
x = pl.DataFrame(
{
'A': [1., None, 3.],
'B': [4., 5., 6.],
'C': [7., 8., None],
}
)
x.with_columns(
pl.when(pl.sum_horizontal('B', 'C') > 12)
.then(pl.sum_horizontal(pl.col('A', 'B')))
.otherwise(None)
.alias('A+B when B+C>12 (expected)'),
pl.when(pl.sum_horizontal('B', 'C') > 12)
.then(pl.concat_list(pl.col('A', 'B')).list.sum())
.otherwise(None)
.alias('A+B when B+C>12 (odd)'),
pl.concat_list(pl.exclude('C')).list.sum().alias('A+B'),
)
Run the above I got the following:
┌──────┬─────┬──────┬────────────────────────────┬───────────────────────┬─────┐
│ A ┆ B ┆ C ┆ A+B when B+C>12 (expected) ┆ A+B when B+C>12 (odd) ┆ A+B │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ list[f64] ┆ f64 │
╞══════╪═════╪══════╪════════════════════════════╪═══════════════════════╪═════╡
│ 1.0 ┆ 4.0 ┆ 7.0 ┆ null ┆ null ┆ 5.0 │
│ null ┆ 5.0 ┆ 8.0 ┆ 5.0 ┆ [5.0] ┆ 5.0 │
│ 3.0 ┆ 6.0 ┆ null ┆ null ┆ null ┆ 9.0 │
└──────┴─────┴──────┴────────────────────────────┴───────────────────────┴─────┘
The last two columns have the same kind of calculations .list.sum()
: one returns list[f64]: [5.0]
(the old version Polars gives me f64: 5.0
which is expected), while the other returns f64: 5.0
, which is expected.
Is it natural for the aggregation functions like .list.sum()
to always return a single value other than a list?