Any idea why the C# version of sqrt (System.Math.Sqrt
) is ~10 times slower than c++ version ? Furthermore, C# version seems to have one extra digit of precision. I have run my test under MSVC2012.
I have used double and call System.Math.Sqrt
once before doing the bench in order to force Jit
13
I am speaking only from the C side (and thus applicable to C++). I have no system that can run C# to work from.
The first program I wrote was the trivial:
#include <math.h>
#include <stdio.h>
int main(void) {
printf("%fn",sqrt(2.0));
}
Using gcc -S -O3 sqrt.c
I got the compiled source in sqrt.s and looked at that.
.file "sqrt.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC1:
.string "%fn"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB14:
.cfi_startproc
movsd .LC0(%rip), %xmm0
movl $.LC1, %edi
movl $1, %eax
jmp printf
.cfi_endproc
.LFE14:
.size main, .-main
.section .rodata.cst8,"aM",@progbits,8
.align 8
.LC0:
.long 1719614413
.long 1073127582
.ident "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section .note.GNU-stack,"",@progbits
One will note that there is no call to sqrt in the code – it looks like its just loading a constant (which it is).
This became more apparent when writing one that used a variable and doing the compile to demonstrate what a call to sqrt
would look like.
I’m not going for any sort of elegance with this code.
#include <math.h>
#include <stdio.h>
void main(int argc, char **argv) {
double num = atoi(argv[0]);
printf("%fn",sqrt(num));
}
While gcc -O3 -S sqrt.c
worked, this second program as gcc -O3 -S sqrt2.c
returned
/tmp/cckmgfMS.o: In function `main':
sqrt2.c:(.text+0x46): undefined reference to `sqrt'
collect2: ld returned 1 exit status
It was calling sqrt
, and I forgot to link the math library.
When adding the link to the code, one can see the call to sqrt in it:
.file "sqrt2.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%fn"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB14:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movq (%rsi), %rdi
xorl %eax, %eax
call atoi
cvtsi2sd %eax, %xmm1
sqrtsd %xmm1, %xmm0
ucomisd %xmm0, %xmm0
jp .L5
.L2:
movl $.LC0, %edi
movl $1, %eax
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
jmp printf
.L5:
.cfi_restore_state
movapd %xmm1, %xmm0
call sqrt
jmp .L2
.cfi_endproc
.LFE14:
.size main, .-main
.ident "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section .note.GNU-stack,"",@progbits
One can see in this code the call to sqrt, and the lack of the constants that the optimizer put in.
From the comment above:
I have of course store the 10^6 sqrt(2.0) calls by doing a sum in a
variable ( i.e: var += sqrt(2.0) ) and print it on screen at the end
to be sure that compilator will not skip some codes. – Guillaume07 Dec
24 ’12 at 19:19
So, consider – if you are dealing with constants, this is something that the C and C++ optimizers will identify and optimize out.
Failing having access to C#, I looked at how Java deals with the line:
System.out.println(Math.sqrt(2.0));
This instruction is compiled to the Java byte code of:
0 getstatic java.lang.System.out : java.io.PrintStream [16]
3 ldc2_w <Double 2.0> [22]
6 invokestatic java.lang.Math.sqrt(double) : double [24]
9 invokevirtual java.io.PrintStream.println(double) : void [30]
One can see that the Java complier doesn’t have access to the information of the output of sqrt()
to be able to optimize into a constant. It is possible that the JIT optimizer might have access to the information about the purity of calls through Math
to StrictMath
and replace multiple calls of Math.sqrt(2.0)
to the same value (and not call it again), however it still has to call it once at that point to get the value. That said, I don’t have any insight into what goes on at runtime in the JIT and how calls to pure functions that end up native might be optimized.
However, the C optimizer is still ahead of the game with a big loop (assuming that the JIT optimizer only needs to make one call to sqrt() to get that first value).
When looking at the optimization of the loop in C, the optimizer even precalculates the loop.
#include <math.h>
#include <stdio.h>
int main(void) {
double sum = 0;
int i = 0;
for(i; i < 10; i++) {
sum += sqrt(2.0);
}
printf("%fn",sum);
}
through gcc -O3 -S sqrt3.c
(still no -lm needed) becomes:
.file "sqrt3.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC1:
.string "%fn"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB14:
.cfi_startproc
movsd .LC0(%rip), %xmm0
movl $.LC1, %edi
movl $1, %eax
jmp printf
.cfi_endproc
.LFE14:
.size main, .-main
.section .rodata.cst8,"aM",@progbits,8
.align 8
.LC0:
.long 2034370
.long 1076644038
.ident "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section .note.GNU-stack,"",@progbits
And one can see that this code is identical to the first one, with different constants in .LC0
section. The loop has been calculated down to just “the ultimate value is this, don’t bother doing it at run time.”
8