I’ve got a ymm register where I only care about the first byte of every four bytes, i.e. I want exactly _mm256_cvtepi32_epi8
, but it’s only in AVX512.
A couple approaches I can think of
-
zero out the 3 high bytes per doubleword, then use _mm256_packs_epi32 followed by _mm256_packs_epi16 (zeroing required because these saturate instead of truncate)
-
_mm256_shuffle_epi8 to put the bytes I want in the low bytes of the register (per lane), then _mm256_blend_epi32 to combine across the two lanes