I am reading the “Learning Spark” book. They say to use broadcast hash joins when “each key within the smaller and larger data sets is hashed to the same partition by Spark”.
How exactly do I know/ensure whether these keys from the smaller and larger data sets are hashed to the same partition?