With Python Polars, how can I compare two frame (like with ==
), and get a comparison result per-cell, but while returning True/False on comparisons with null. By default, doing df1 == df2
results in null
being in any cells where either df1
or df2
contains a null
.
For example:
df1 = pl.DataFrame(
{
"a": [1, 2, 3, None, 5],
"b": [5, 4, 3, 2, None],
}
)
df2 = pl.DataFrame(
{
"a": [1, 2, 3, 1, 5],
"b": [5, 4, 30, 2, None],
}
)
print(f"df1: {df1}")
print(f"df2: {df2}")
print(f"df1 == df2: {df1 == df2}")
Results in:
df1: shape: (5, 2)
┌──────┬──────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪══════╡
│ 1 ┆ 5 │
│ 2 ┆ 4 │
│ 3 ┆ 3 │
│ null ┆ 2 │
│ 5 ┆ null │
└──────┴──────┘
df2: shape: (5, 2)
┌─────┬──────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════╡
│ 1 ┆ 5 │
│ 2 ┆ 4 │
│ 3 ┆ 30 │
│ 1 ┆ 2 │
│ 5 ┆ null │
└─────┴──────┘
df1 == df2: shape: (5, 2)
┌──────┬───────┐
│ a ┆ b │
│ --- ┆ --- │
│ bool ┆ bool │
╞══════╪═══════╡
│ true ┆ true │
│ true ┆ true │
│ true ┆ false │
│ null ┆ true │
│ true ┆ null │
└──────┴───────┘
However, I’m trying to determine how to get the following result:
df1 compared to df2: shape: (5, 2)
┌──────┬───────┐
│ a ┆ b │
│ --- ┆ --- │
│ bool ┆ bool │
╞══════╪═══════╡
│ true ┆ true │
│ true ┆ true │
│ true ┆ false │
│false ┆ true │ <- false b/c cell is null in one DF, and a value in the other
│ true ┆ true │ <- bottom-right cell is true
└──────┴───────┘ because df1 and df2 have the same value (null)