I’ve created a struct with the data of some columns combined. Large numbers of these structs now occur for my unique identifier values. I want to combine these structs into an array using collect_list.
Unfortunately, I’m getting errors like
java.lang.Exception: Results too large
at com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:458)
So, how many elements (structs in my case) can an array in a Spark column contain?
Here’s the Scala code I’m using
val compact_df = deliveries_df
.withColumn("file_detail", struct($"file_id", $"delivery_seqno", $"file_path_name"))
.groupBy('delivery_id, 'delivery_file_type)
.agg(collect_list('file_detail).alias("file_details"))
display(compact_df)