I am looking for an example where an algorithm is apparently changing its complexity class due to compiler and/or processor optimization strategies.
7
Lets take a simple program which prints the square of a number entered on the command line.
#include <stdio.h>
int main(int argc, char **argv) {
int num = atoi(argv[1]);
printf("%dn",num);
int i = 0;
int total = 0;
for(i = 0; i < num; i++) {
total += num;
}
printf("%dn",total);
return 0;
}
As you can see, this is a O(n) calculation, looping over and over again.
Compiling this with gcc -S
one gets a segment that is:
LBB1_1:
movl -36(%rbp), %eax
movl -28(%rbp), %ecx
addl %ecx, %eax
movl %eax, -36(%rbp)
movl -32(%rbp), %eax
addl $1, %eax
movl %eax, -32(%rbp)
LBB1_2:
movl -32(%rbp), %eax
movl -28(%rbp), %ecx
cmpl %ecx, %eax
jl LBB1_1
In this you can see the add being done, a compare and a jump back for the loop.
Doing the compile with gcc -S -O3
to get optimizations the segment between the calls to printf:
callq _printf
testl %ebx, %ebx
jg LBB1_2
xorl %ebx, %ebx
jmp LBB1_3
LBB1_2:
imull %ebx, %ebx
LBB1_3:
movl %ebx, %esi
leaq L_.str(%rip), %rdi
xorb %al, %al
callq _printf
One can now see instead it has no loop and furthermore, no adds. Instead there is a call to imull
which multiplies the number by itself.
The compiler has recognized a the loop and the math operator inside and replaced it by the proper calculation.
Note that this included a call to atoi
to get the number. When the number exists already in the code, the complier will pre-calculate the value rather than making actual calls as demonstrated in a comparison between the performance of sqrt in C# and C where sqrt(2)
(a constant) was being summed across a loop 1,000,000 times.
0
Tail Call Optimization may reduce the space complexity. For example, without TCO, this recursive implementation of a while
loop has a worst-case space complexity Ο(#iterations)
, whereas with TCO it has a worst-case space complexity of Ο(1)
:
// This is Scala, but it works the same way in every other language.
def loop(cond: => Boolean)(body: => Unit): Unit = if (cond) { body; loop(cond)(body) }
var i = 0
loop { i < 3 } { i += 1; println(i) }
// 1
// 2
// 3
// E.g. ECMAScript:
function loop(cond, body) {
if (cond()) { body(); loop(cond, body); };
};
var i = 0;
loop(function { return i < 3; }, function { i++; print(i); });
This doesn’t even need general TCO, it only needs a very narrow special case, namely elimination of direct tail recursion.
What would be very interesting though, is where a compiler optimization not just changes the complexity class but actually changes the algorithm completely.
The Glorious Glasgow Haskell Compiler sometimes does this, but that’s not really what I am talking about, that’s more like cheating. GHC has a simple Pattern Matching Language that allows the developer of the library to detect some simple code patterns and replace them with different code. And the GHC implementation of the Haskell standard library does contain some of those annotations, so that specific usages of specific functions which are known to be inefficient are rewritten into more efficient versions.
However, these translations are written by humans, and they are written for specific cases, that’s why I consider that cheating.
A Supercompiler may be able to change the algorithm without human input, but AFAIK no production-level supercompiler has ever been built.
1
A compiler which is aware that the language is using big-num doing strength reduction (replacing multiplications by the index of a loop by an addition) would change the complexity of that multiplication from O(n log n) at best to O(n).