How to ensure that a pair of keys from the smaller and larger dataset are hashed to the same partition in Spark?
I am reading the “Learning Spark” book. They say to use broadcast hash joins when “each key within the smaller and larger data sets is hashed to the same partition by Spark”.