I was helping a student with his assignment and ran into a very gnarly “bug” which I could not explain and was wondering if I could get help in understanding what exactly is going on.
The question was to implement the following: given a byte array (buf
) of length N, calculate the 16-bit checksum given by the formula:
checksum = ~(buf[0]) + ~(buf[1]) + ... + ~(buf[N-2]) + ~(buf[N-1])
The implementation the student made was pretty straightforward:
uint16_t calculate_checksum(uint8_t *msg_buf, size_t msg_length)
{
uint16_t checksum = 0;
for (size_t i = 0; i < msg_length; i++)
{
checksum += ~(msg_buf[i]);
}
return checksum;
}
However to my suprise, this implementation did not give the expected result.
I’ve tried looking at the checksum
variable by printing its value in the loop, which yielded an unexpected pattern:
65471
65459
65449
65287
65276
65253
65166
65113
65094
...
The checksum starts at 0
, but after the first addition the value sits somewhere in the upper range of the uint16_t
and going down. I suspected the value underflowing due to msg_buf[i]
being implicitly converted into a uint16_t
before complementing it. But I don’t understand why, I would expect that first msg_buf
gets indexed with i
, then the byte complement is calculated (which would yield a value within 0-255), only then is it (implicitly) promoted into a uint16_t
.
I’ve tried to look at the assembly output of the line checksum += ~(msg_buf[i])
, which seems the support this theory (using ARM gcc 14.1.0). Note that [r7, #4]
is the msg_buf
pointer, [r7, #8]
is i
and [r7, #14]
is checksum
. The assembly does something weird with subtraction, which yields the same result if you would convert the msg_buf[i]
byte to a uint16_t
before complementing.
ldr r2, [r7, #4]
ldr r3, [r7, #8]
add r3, r3, r2
ldrb r3, [r3] @ zero_extendqisi2
mov r2, r3
ldrh r3, [r7, #14] @ movhi
subs r3, r3, r2
uxth r3, r3
subs r3, r3, #1
strh r3, [r7, #14] @ movhi
Knowing this, the fix we came up with is pretty straightforward. We opted to just AND the results of the righthand side with 0xFF
to get rid of the higher bits, which yielded the correct checksum.
So the issue is essentially fixed, but I still don’t understand why this is an issue. Maybe this behaviour is expected and I don’t know the proper operation orders, or maybe something else is happening, I really don’t know. Can someone please explain why this is happening?
2