For a project to learn C, I decided for fun to put the code in Compiler Explorer and compare the assembly output with the C code. Here’s a minimal example of some code
unsigned char count[256][256];
int main()
{
for (int i=0; i<256; ++i)
for (int j=0; j<256; ++j)
count[i][j] = 0;
}
-Os
turned this into rep stosd
which make sense.
mov edx, OFFSET FLAT:count
xor eax, eax
mov ecx, 16384
mov rdi, rdx
rep stosd
But -O2
turned it into a memset call?
mov edx, 65536
xor esi, esi
mov edi, OFFSET FLAT:count
call memset
Isn’t calling memset slower than just inlining the assembly instructions? Or does it depend on memset which is optimized differently for different architectures?