I have a polars dataframe with two columns, one contains lists of string and the other one string. I want to apply the following expression to both columns. However, for some reason isinstance(x, list)
doesn’t work properly.
def process_column(column_name: str, alias_name: str) -> pl.Expr:
return (
pl.col(column_name).map_elements(lambda x: " ".join(x) if isinstance(x, list) else x)
.str.to_lowercase()
.str.split(by="-")
.list.join(" ")
#.str.contains("hello")
.alias(alias_name)
)
here is a sample dataframe
df = pl.DataFrame({
"lists": [["hello", "World"], ["polars", "IS", "fast"]],
"strings": ["foo-hello", "bOO"]
})
This line works just fine:
df2 = df.with_columns(process_column("strings", "processed_string"))
But this one throws the following error.
df2 = df.with_columns(process_column("lists", "processed_lists"))
The error:
polars.exceptions.SchemaError: invalid series dtype: expected `String`, got `list[str]`
I tried the map_elements
with return_dtype=pl.String
. It doesn’t return an error but the output is wrong.