Serialization error in spark drop duplicates
I’m encountering a serialization issue when using the dropDuplicates
function in Spark. Here’s the code I’m using:
Why my spark structured streaming application doesn’t stop when an assertion is failed?
When the app fails on the assertion, the app doesn’t stop, and it seems like the stream is keep running. Actually, this is only an example, and I have multiple Spark streaming that run in parallel, but I think it doesn’t matter because I called to spark.streams.awaitAnyTermination(3)
Convert nested avro structures to flat schema in Apache Spark
I have a use case where I have to read data from Kafka and write to a Sink . The data in kafka is in avro and the fields are wrapped in an avro map. The map will not have same keys always and will vary with the type of data. I have to flatten this map i.e. convert each key to a column and its a value is a value in the column. I have a standard schema to flatten this to before writing to Sink. I am writing this data in delta format / parquet on the Sink How can i approach this?