I have a global temp view in spark, here’s a piece of schema with problematic column:
|-- lvl1Col: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- lvl2colElement: string (nullable = true)
|-- lvl2colElement: string (nullable = true)
I have a SQL expression like this:
SELECT
x,
sum(case when lvl1Col.lvl2colElement = 0 then 1 else 0) AS conditionalColumn
FROM
global_view
group by
x
When an array comes in with nulls, I print it out, and it looks like this:
lvl1Col
[]
However, these subcolumns are still in the schema. The issue is conditionalColumn, which gets evaulated as null, when the array is empty. I tried:
- Wrapping the condition in coalesce, like this:
coalesce(sum(case when lvl1Col.lvl2colElement = 0 then 1 else 0) AS conditionalColumn,0)
I get null.
2. Wrapped lvl1Col.lvl2colElement in a coalesce – got null.
3. Make a condition on size of the array. I get an error:
to data type mismatch: argument 1 requires (array or map) type, however, 'lvl1Col' is of struct<elements listed here>
Even though the printed schema said array…
4.
case when lvl1Col is null then 1 else 0
Error, the column is not in groupby statement.
And so on.
Any ideas how to get around it and make some sort of a coalesce work here?
I should add, that I can only modify a column expression here, and must operate in SQL syntax, so I need something that i can wrap in a column expression.
Thanks