Consider the following example:
from enum import Enum
import polars as pl
from typing import NamedTuple
Months = Enum(
"MONTHS",
[
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec",
],
)
plMonths = pl.Enum([m.name for m in Months])
plYearMonth = pl.Struct({"year": int, "month": plMonths})
class YearMonth(NamedTuple):
year: int
month: str
dates = pl.Series(
"year_months",
[YearMonth(2024, Months.Jan.name), YearMonth(2024, Months.Feb.name)],
dtype=plYearMonth,
)
values = pl.Series("value", [0, 1])
hellos = pl.Series("hello", ["a"]*2 + ["b"]*2)
df = pl.DataFrame({
"hello": pl.Series(["a", "a", "b", "b"]),
"date": pl.concat([dates, dates]),
"value": pl.concat([values, values.reverse()])
})
df
shape: (4, 3)
┌───────┬──────────────┬───────┐
│ hello ┆ date ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ struct[2] ┆ i64 │
╞═══════╪══════════════╪═══════╡
│ a ┆ {2024,"Jan"} ┆ 0 │
│ a ┆ {2024,"Feb"} ┆ 1 │
│ b ┆ {2024,"Jan"} ┆ 1 │
│ b ┆ {2024,"Feb"} ┆ 0 │
└───────┴──────────────┴───────┘
I then pivot df
:
pivoted = df.pivot(index="hello", columns="date", values="value")
pivoted
shape: (2, 3)
┌───────┬──────────────┬──────────────┐
│ hello ┆ {2024,"Jan"} ┆ {2024,"Feb"} │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════╪══════════════╪══════════════╡
│ a ┆ 0 ┆ 1 │
│ b ┆ 1 ┆ 0 │
└───────┴──────────────┴──────────────┘
The headers have now become strings, quite understandably, as can be seen by the following unpivot:
unpivoted = pivoted.melt(id_vars="hello", variable_name="date", value_name="value")
unpivoted
shape: (4, 3)
┌───────┬──────────────┬───────┐
│ hello ┆ date ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪══════════════╪═══════╡
│ a ┆ {2024,"Jan"} ┆ 0 │
│ b ┆ {2024,"Jan"} ┆ 1 │
│ a ┆ {2024,"Feb"} ┆ 1 │
│ b ┆ {2024,"Feb"} ┆ 0 │
└───────┴──────────────┴───────┘
But recall that df
‘s original date
column had a struct data type plYearMonth
. Is there any way at all to do the unpivot so that unpivoted
‘s date
data is interpreted again as plYearMonth
without performing a reparsing operation?
My best guess is that this is not possible, and its instead better to have a dictionary which map between string and struct representation?