The user guide suggests that enums have a physical, integer representation: https://docs.pola.rs/user-guide/concepts/data-types/categoricals/#enum-type
So does the answer to this StackOverflow question: /a/77770318/3486684
Is it possible to get the integers associated with an enum value?
For example, is there a nicer way to get the integer representation in this example, apart from using a lookup into a list or series:
import numpy as np
import polars as pl
seed = 556
print(f"seed={seed}")
np.random.seed(seed)
# print out all rows of our small table
pl.Config.set_tbl_rows(-1)
enum_vals = [
"".join([chr(c_code) for c_code in np.random.randint(97, 123, 3)])
for n in range(10)
]
enum_dtype = pl.Enum(pl.Series(enum_vals))
pl.DataFrame(
[
pl.Series(
"enum_vals",
[enum_vals[x] for x in np.random.randint(0, len(enum_vals), 5)],
dtype=enum_dtype,
)
]
).with_columns(
enum_repr=pl.col("enum_vals").map_elements(
lambda x: enum_vals.index(x), return_dtype=pl.Int64()
)
)
# seed=556
#
# shape: (5, 2)
# ┌───────────┬───────────┐
# │ enum_vals ┆ enum_repr │
# │ --- ┆ --- │
# │ enum ┆ i64 │
# ╞═══════════╪═══════════╡
# │ loo ┆ 8 │
# │ sby ┆ 5 │
# │ cqm ┆ 3 │
# │ cbn ┆ 2 │
# │ vtk ┆ 9 │
# └───────────┴───────────┘
The answer linked in the question (/a/77770318/3486684) suggests that we might be able to cast into an enum value into a UInt32
, and this is indeed the case:
import numpy as np
import polars as pl
seed = 556
print(f"seed={seed}")
np.random.seed(seed)
# print out all rows of our small table
pl.Config.set_tbl_rows(-1)
enum_vals = [
"".join([chr(c_code) for c_code in np.random.randint(97, 123, 3)])
for n in range(10)
]
enum_dtype = pl.Enum(pl.Series(enum_vals))
pl.DataFrame(
[
pl.Series(
"enum_vals",
[enum_vals[x] for x in np.random.randint(0, len(enum_vals), 5)],
dtype=enum_dtype,
)
]
).with_columns(enum_repr=pl.col("enum_vals").cast(pl.UInt32()))
# seed=556
#
# shape: (5, 2)
# ┌───────────┬───────────┐
# │ enum_vals ┆ enum_repr │
# │ --- ┆ --- │
# │ enum ┆ i64 │
# ╞═══════════╪═══════════╡
# │ loo ┆ 8 │
# │ sby ┆ 5 │
# │ cqm ┆ 3 │
# │ cbn ┆ 2 │
# │ vtk ┆ 9 │
# └───────────┴───────────┘
Surprisingly, directly casting to UInt64
(just as an example) does not work, so casting seems to be dependent on knowing a hidden implementation detail (which type is used for the physical representation of an enum value), and therefore not robust as a solution.