I have a pl.DataFrame
with a column comprising lists like this:
import polars as pl
df = pl.DataFrame(
{
"symbol": ["A", "A", "B", "B"],
"roc": [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8]],
}
)
shape: (4, 2)
┌────────┬────────────┐
│ symbol ┆ roc │
│ --- ┆ --- │
│ str ┆ list[f64] │
╞════════╪════════════╡
│ A ┆ [0.1, 0.2] │
│ A ┆ [0.3, 0.4] │
│ B ┆ [0.5, 0.6] │
│ B ┆ [0.7, 0.8] │
└────────┴────────────┘
Further, I have a regular python list weights = [0.3, 0.7]
What’s an efficient way to multiply pl.col("roc")
with weights
in a way where the first and second element of the column will be multiplied with the first and second element of weights
, respectively?
The expected output is like this:
shape: (4, 3)
┌────────┬────────────┐──────────────┐
│ symbol ┆ roc │ roc_wgt │
│ --- ┆ --- │ --- │
│ str ┆ list[f64] │ list[f64] │
╞════════╪════════════╡══════════════╡
│ A ┆ [0.1, 0.2] │ [0.03, 0.14] │ = [0.1 * 0.3, 0.2 * 0.7]
│ A ┆ [0.3, 0.4] │ [0.09, 0.28] │ = [0.3 * 0.3, 0.4 * 0.7]
│ B ┆ [0.5, 0.6] │ [0.15, 0.42] │ = [0.5 * 0.3, 0.6 * 0.7]
│ B ┆ [0.7, 0.8] │ [0.21, 0.56] │ = [0.7 * 0.3, 0.8 * 0.7]
└────────┴────────────┘──────────────┘
There is a pending PR to allow pl.col.roc * pl.lit(weights)
- https://github.com/pola-rs/polars/pull/17823
There is also an Array
fixed width type.
dtype = pl.Array(float, 2)
(df.with_columns(wgt = pl.lit(weights, dtype))
.with_columns(roc_wgt = pl.col.roc.cast(dtype) * pl.col.wgt)
)
shape: (4, 4)
┌────────┬────────────┬───────────────┬───────────────┐
│ symbol ┆ roc ┆ wgt ┆ roc_wgt │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ array[f64, 2] ┆ array[f64, 2] │
╞════════╪════════════╪═══════════════╪═══════════════╡
│ A ┆ [0.1, 0.2] ┆ [0.3, 0.7] ┆ [0.03, 0.14] │
│ A ┆ [0.3, 0.4] ┆ [0.3, 0.7] ┆ [0.09, 0.28] │
│ B ┆ [0.5, 0.6] ┆ [0.3, 0.7] ┆ [0.15, 0.42] │
│ B ┆ [0.7, 0.8] ┆ [0.3, 0.7] ┆ [0.21, 0.56] │
└────────┴────────────┴───────────────┴───────────────┘
Adding as a column is required as the pl.lit()
case currently panics.
- https://github.com/pola-rs/polars/issues/18831
2
One option by first running explode
:
(df.with_row_index().with_columns(l=weights).explode(['roc', 'l'])
.with_columns(roc_wgt=pl.col('roc')*pl.col('l'))
.group_by('index')
.agg(pl.col('symbol').first(),
pl.col('roc'),
pl.col('roc_wgt'))
)
Output:
┌───────┬────────┬────────────┬──────────────┐
│ index ┆ symbol ┆ roc ┆ roc_wgt │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ str ┆ list[f64] ┆ list[f64] │
╞═══════╪════════╪════════════╪══════════════╡
│ 0 ┆ A ┆ [0.1, 0.2] ┆ [0.03, 0.14] │
│ 1 ┆ A ┆ [0.3, 0.4] ┆ [0.09, 0.28] │
│ 2 ┆ B ┆ [0.5, 0.6] ┆ [0.15, 0.42] │
│ 3 ┆ B ┆ [0.7, 0.8] ┆ [0.21, 0.56] │
└───────┴────────┴────────────┴──────────────┘
Alternatively, with map_elements
:
import numpy as np
df.with_columns(roc_wgt=pl.col('roc')
.map_elements(lambda x: x*np.array(weights),
return_dtype=pl.List(pl.Float64)))
Output:
┌────────┬────────────┬──────────────┐
│ symbol ┆ roc ┆ roc_wgt │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞════════╪════════════╪══════════════╡
│ A ┆ [0.1, 0.2] ┆ [0.03, 0.14] │
│ A ┆ [0.3, 0.4] ┆ [0.09, 0.28] │
│ B ┆ [0.5, 0.6] ┆ [0.15, 0.42] │
│ B ┆ [0.7, 0.8] ┆ [0.21, 0.56] │
└────────┴────────────┴──────────────┘
You can also iterate over your weights list, grab the corresponding element
from your column, multiply them together and concatenate those results together
in a new list.
import polars as pl
df = pl.DataFrame(
{
"symbol": ["A", "A", "B", "B"],
"roc": [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8]],
}
)
weights = [.3, .7]
print(
df.with_columns(
roc_wgt=pl.concat_list(
pl.col('roc').list.get(i) * wgt for i, wgt in enumerate(weights)
)
)
)
# shape: (4, 3)
# ┌────────┬────────────┬──────────────┐
# │ symbol ┆ roc ┆ roc_wgt │
# │ --- ┆ --- ┆ --- │
# │ str ┆ list[f64] ┆ list[f64] │
# ╞════════╪════════════╪══════════════╡
# │ A ┆ [0.1, 0.2] ┆ [0.03, 0.14] │
# │ A ┆ [0.3, 0.4] ┆ [0.09, 0.28] │
# │ B ┆ [0.5, 0.6] ┆ [0.15, 0.42] │
# │ B ┆ [0.7, 0.8] ┆ [0.21, 0.56] │
# └────────┴────────────┴──────────────┘
Unfortunately, polars doesn’t support such operations with lists yet, but it does support them with structs:
(
df.with_columns(
roc_wgt = pl.col.roc.list.to_struct() * pl.lit(weights).list.to_struct()
).with_columns(
roc_wgt = pl.concat_list(pl.col.roc_wgt.struct.field("*"))
)
)
shape: (4, 3)
┌────────┬────────────┬──────────────┐
│ symbol ┆ roc ┆ roc_wgt │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞════════╪════════════╪══════════════╡
│ A ┆ [0.1, 0.2] ┆ [0.03, 0.14] │
│ A ┆ [0.3, 0.4] ┆ [0.09, 0.28] │
│ B ┆ [0.5, 0.6] ┆ [0.15, 0.42] │
│ B ┆ [0.7, 0.8] ┆ [0.21, 0.56] │
└────────┴────────────┴──────────────┘