For day 4 of the 2024 advent of code there’s a problem where you need to find how many “XMAS” strings are contained in a grid of characters such as
MMMSXXMASM
MSAMXMSMSA
AMXSXMAAMM
MSAMASMSMX
XMASAMXAMM
XXAMMXXAMA
SMSMSASXSS
SAXAMASAAA
MAMMMXMMMM
MXMXAXMASX
The way I solved that problem was to load all 32 characters of the 8 possible directions starting from an X into a SIMD register and comparing it to a mask.
That worked and is relatively fast (it reached 460 MB/s when I increased the input file size to test its limit) but when talking about our solutions on a discord server containing a lot of developers, and a few ones with over 15 years of professional experience, one of them told me that for this specific problem the use of SIMD instructions isn’t what would make the code run faster and the underlying algorithm itself is more important.
So I’m wondering, in what situations is SIMD worth it? How do I recognize a situation where using SIMD instructions would make a noticeable difference? I thought that comparing all 32 characters in 1 instruction would make a big difference instead of nesting 2 for loops but I might have been wrong?