I have an AWS Glue (PySpark) job that is exporting some data to JSON and attempting to match a known contract. The contract calls for certain fields to have variable types per record. For example, a field on one record may be a number, or a string on another.
As far as I can tell, Spark doesn’t have this concept yet – any field must have a single type.
Spark will be getting a Variant type in 4.0 which seems perfect for this, but alas is not available yet (& there will likely be a delay before it’s available in Glue, my environment).
Are there any workarounds that could get me there on Spark 3.x? I was thinking maybe some way of hooking into JSON serialization, but I haven’t found any workable options there.
tmonocer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.