I have a grayscale matrix (implemented as a simple array) to be shown on a display:
line 1: | a | b | c |
line 2: | d | e | f |
a-f
are grayscale values. This pattern has to be shifted by a variable percentage to the left or to the right.
What I currently do is an easy approach:
For example, shift 20% to left, so:
a_new = a * (1 - 0.2) + b * 0.2
b_new = b * (1 - 0.2) + c * 0.2
c_new = c * (1 - 0.2) + 0
and so on. That is, I have a loop over all elements.
With only six elements I don’t care, but in fact the matrix is much bigger.
Of course, the factor (1 - 0.2)
can be reused, but I wonder if there is a smarter way of calculating the resulting (shifted) matrix?
3
If your matrix can be accessed as a 2D array (x, y), then your problem generalises to
new(x, y) = old(x, y) * (1 - 0.2) + old(x + 1, y) * 0.2
for all but the last column. The last column is a special case. Either use
new(x, y) = old(x, y) * (1 - 0.2)
for that column only, or else pad your source array with an extra column of zeros first and then you don’t have to treat the last column as a special case any more.
This can easily be implemented using two nested for loops. Some specialised languages (such as Matlab) would be able to do the whole lot in one hit, with no explicit for loops at all.
Do you know where the majority of time is being spent? There could be all sorts of ways to speed this up. There’s not enough code to know for sure, unless you’re specifically asking only about the math.
Computers are extremely good at math, so the problem is likely somewhere else than the actual computation of the data. For example, in one toolkit I know if, changing each pixel in an image one at a time is orders of magnitude slower than computing a whole row or the whole image in memory before saving the data to the image object.
The basic approach would be; for each row:
- Load the entire row into “SIMD register 1” (MMX, SSE, AVX, Neon, whatever)
- Copy it into “SIMD register 2”
- Multiply “SIMD register 2” by 0.2 (pity it’s not 25%…)
- Subtract “SIMD register 2” from “SIMD register 1”
- Shift “SIMD register 2” right by 1 element
- Add “SIMD register 2” to “SIMD register 1”
- Store “SIMD register 1” in the destination
The basic approach is “basic”. In practice you’re probably going to have to handle the case where the entire row doesn’t fit in a SIMD register. This mostly means that the “shift right” is going to have to involve storing the lowest element (that would’ve been shifted out) and shifting it in during the next shift as a new highest element.
In any case, it shouldn’t be that hard to end up limited by RAM bandwidth for large arrays/matrices (assuming you’re also doing sane pre-fetch, and possibly non-temporal stores).