I have a question about hashmaps.
If you have this hashmap with m slots, and need to map n items to it, and n > m. There will be collisions for sure. But assuming there is simple uniform hashing assumption, this means that the maximum amount of collisions in any spot is the load factor right? The load factor is n/m. But the load factor is probably going to be a decimal, does this mean its a ceiling of the n/m? So if n=6 and m=5, then load factor is 1.2, and the ceiling is 2, so that means the maximum collisions in one spot is 2.
Is this right?
No, the maximum number of collisions is the number of hash elements to be put into the hash.
From the wikipedia link on Simple Uniform Hashing Assumption:
Moreover, each item to be hashed has an equal probability of being placed into a slot, regardless of the other elements already placed.
The key word is “probability” – it is possible for all of them to hash to the same spot, and thus the maximum collisions is the size. It is also possible for them to hash evenly, in which case it would be the ceiling of the load factor.
Particularlly relevant to this is:
Collision chain length Under the assumption of uniform hashing, the
load factor A and the average chain length of a hash table of size m
with n elements will be A = n/m
Where n
is the number of elements and m
is the size of the hash. Key phrase again is the average chain length – not the maximum chain length.
Continuing to read the article, one will note phrases such as “on average” used regularly – this is about the probability of an uniform hash – not the guarantee of such.
An example of such a hash algorithm would be mod hashSize
on integers. If the hash size was 5 hash(x) = x % 5
would be a hash that fits this requirement – that each value hashed has an equal probability of being placed into a given slot.
However, if one was to hash into this size 5 hash the numbers: 5, 10, 15, 20, 25, 30
one would then have everything hash to the same location.
4