I would like to fill a value in a column based on another columns’ name, in the Polars library from python (I obtained the following DF by exploding my variables’ column names):
Input:
df = pl.from_repr("""
┌────────┬─────────┬────────┬─────┬──────────┐
│ Name ┆ Average ┆ Median ┆ Q1 ┆ Variable │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ str │
╞════════╪═════════╪════════╪═════╪══════════╡
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Average │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Median │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Q1 │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Average │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Median │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Q1 │
└────────┴─────────┴────────┴─────┴──────────┘
""")
Expected output:
shape: (6, 6)
┌────────┬─────────┬────────┬─────┬──────────┬───────┐
│ Name ┆ Average ┆ Median ┆ Q1 ┆ Variable ┆ value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞════════╪═════════╪════════╪═════╪══════════╪═══════╡
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Average ┆ 2 │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Median ┆ 3 │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Q1 ┆ 4 │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Average ┆ 1 │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Median ┆ 5 │
│ Banana ┆ 1 ┆ 5 ┆ 10 ┆ Q1 ┆ 10 │
└────────┴─────────┴────────┴─────┴──────────┴───────┘
I have tried:
df = df.with_columns(value = pl.col(f"{pl.col.variable}"))
But that does not work because polars perceives the argument as a function (?). Does anyone know how to do this?
Note: I have also tried to transpose the dataframe, which, not only was that computationally expensive, also did not work! Because it would transpose the DF into a 5-rows-long DF. What I need is a (Name * Number of Variables)-rows-long DF.
That is, for example, I have 3 different names (say, Apple, Banana, and Dragonfruit), and I have 3 variables (Average, Median, Q1), then my DF should be 9-rows-long!
0
You can use
when/then()
to check whether the value of the columnVariable
is the same as the column name.coalesce()
to choose first non-empty result.
cols = [x for x in df.schema if x not in ("Name", "Variable")]
df.with_columns(
value = pl.coalesce(
pl.when(pl.col.Variable == col).then(pl.col(col))
for col in cols
)
)
┌────────┬─────────┬────────┬─────┬──────────┬───────┐
│ Name ┆ Average ┆ Median ┆ Q1 ┆ Variable ┆ value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞════════╪═════════╪════════╪═════╪══════════╪═══════╡
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Average ┆ 2 │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Median ┆ 3 │
│ Apple ┆ 2 ┆ 3 ┆ 4 ┆ Q1 ┆ 4 │
│ Banana ┆ 3 ┆ 5 ┆ 10 ┆ Average ┆ 3 │
│ Banana ┆ 3 ┆ 5 ┆ 10 ┆ Median ┆ 5 │
│ Banana ┆ 3 ┆ 5 ┆ 10 ┆ Q1 ┆ 10 │
└────────┴─────────┴────────┴─────┴──────────┴───────┘
You can leverage this functionality from polars.when()
and polars.then()
with a mapping.
For instance,
variable_columns = ["Q1", "Average", "Median"]
expressions = [
pl.when(pl.col("Variable") == col_name).then(pl.col(col_name))
for col_name in variable_columns
]
combined_expr = pl.fold(
acc=pl.lit(None),
function=lambda acc, x: pl.coalesce(acc, x),
exprs=expressions
)
df = df.with_columns(combined_expr.alias("Value"))
3