Converting between Pair-wise and Component-wise in AVX
I am writing a double-double arithmetic library for AVX/AVX2. One of the issues I encountered was that the non-Simd and Simd versions have different memory layouts.
Converting between Pair-wise and Component-wise
I am writing a double-double arithmetic library for AVX/AVX2. One of the issues I encountered was that the non-Simd and Simd versions have different memory layouts.
AVX MaskLoad/MaskStore performance
Usually, when writing a SIMD like function over a large array of data that might not divide cleanly by register sizes, you can do the bulk with SIMD and then do the last little bit using scalar like code.