There’s code which counts mean value of a column
pd.DataFrame({'id': ['A', 'A', 'B', 'B', 'B', 'B'], 'a': [1, 2, 3, 4, float('inf'), float('inf')]}).groupby('id').mean()
for Pandas. The result is:
a
id
A 1.5
B NaN
But with pl.DataFrame({'id': ['A', 'A', 'B', 'B', 'B', 'B'], 'a': [1, 2, 3, 4, float('inf'), float('inf')]}).groupby('id').mean()
for Polars we got:
┌─────┬─────┐
│ id ┆ a │
│ --- ┆ --- │
│ str ┆ f64 │
╞═════╪═════╡
│ B ┆ inf │
│ A ┆ 1.5 │
└─────┴─────┘
In the first example we have NaN
value for ID “B”, but for the same ID in the second one we have inf
Why and on what principle do they count differently?
I tried to reproduce Polars example on Pandas, but got stuck with different results in dataframes.
Krows is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.