I came across a very weird discovery today.
i was broadcasting a really small dataframe of 10 rows. equivalent to less than 1ko on disk.
what was my suprise when i found out that the data size was estimated to 64mb in the broadcast Exchange. as you can see in the DAG below.
Is this some sort of minimum broadcast size with the spark metadata / jvm string interning overhead thing ? what is this sorcery ?
can anyone explain to me please?
thanks,