I’d like to apply trim_mean
from scipy.stats
to some polars column.
I tried the following, but am unsure about why the second option fails with
IndexError: tuple index out of range**
from scipy.stats import trim_mean
df = pl.DataFrame({
"x": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]
})
# compute regular mean
df.select(
pl.col("x").mean().alias("mean")
)
# trim mean
df.select(
trim_mean(pl.col("x"),0.05).alias("trim_mean")
)
What data type is a polars column in this case? Or is there some other method to compute this in polars?
0
trim_mean
expects an array as the first argument; it does not work with polars
expressions. You can pass df["X"]
instead:
import polars as pl
from scipy.stats import trim_mean
df = pl.DataFrame({"X": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]})
print(trim_mean(df["X"], 0.05))
Output:
6.25
Based on the code here for trim_mean
of 1D arrays, I implemented a custom function that calculates this in pure polars code:
import polars as pl
def polars_trim_mean(expr, proportiontocut):
m = (proportiontocut * pl.len()).floor()
return expr.slice(m, pl.len() - 2 * m).mean()
df = pl.DataFrame({"X": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]})
print(df.select(trim_mean=polars_trim_mean(pl.col("X"), 0.05)))
Output:
shape: (1, 1)
┌───────────┐
│ trim_mean │
│ --- │
│ f64 │
╞═══════════╡
│ 6.25 │
└───────────┘
1