I’ve been told that when I have a hash table of size m
and m=2^k
, I can use the &
operator as num & (size-1)
instead of num % size
, to fit the hashCode to my table size.
I’ve also been told that the the command Num&sizeMinusOne is more than twice faster than num & (size-1)
.
My question is, why?
And isn’t the operation of creating a variable called SizeMinusOne takes time too?
2
Regarding % 2**k
vs. & 2**k-1
: This is a micro optimization, but a feasible one. The compiler can’t prove that size
is always a power of two (as it’s variable and possibly modified by code in different translation units), so it can’t perform the optimization itself. Integer division has significantly worse throughput and latency than bitwise operations on virtually every architecture. To a degree, this can even be justified with complexity theory: Bitwise operations require linear time/circuit size, while the best known divison/modulo operations require super-linear time/circuit size. This effect is actually measurable on contemporary machines too (I know this because I didn’t believe it and tried it).
Regarding size-1
versus sizeMinusOne
: The idea is to store sizeMinusOne
(I’ve seen the name mask
for this) instead of size
, to reduce redundancy. In a very local, short-sighted machine model, num & sizeMinusOne
is (pseudo-RISC)
and r3, r1, r2
while num & (size - 1)
is
sub r3, r2, #1
and r3, r1, r3
with otherwise identical setup and surrounding code. Since bitwise and arithmetic operations are typically equally fast (single ALU cycle, optimal latency) one could indeed argue that the first takes half the resources of the second.
But that ignores the fact that the surrounding code will take much longer, a hundred cycles overall being a very optimistic estimate. In that context, a single cycle is not just peanuts, it’s a measurement error, simply noise. Don’t lose sleep over it. What’s more, depending on the surrounding code it may not even be a single cycle: An out-of-order CPU (such as most contemporary x86 cores) might very well squeeze the subtraction somewhere into the scheduling while an ALU is idle and be done with it before it even finished calculating num
.
There are, however, other reasons to store mask
instead of size
. Usually the former is used more often than the latter, so by preferring mask
you simplify the code.
2