What is the fastest way to add 16 small numbers
I have two arrays, a and b. Each contain 16 bytes and I would like to add each b[i] to their corresponding a[i]. The arrays do not overlap and also I know that the resulting sums always fit in a byte each (important!).