how can I optimize this simple multi-valued simd splat/broadcast?
I want to expand some u8
s out to u64
, except instead of zero or sign extending, which have direct support, I want “copy-extending”. What’s the best way to do this (on intel cpus with avx512)? Example code is in rust but the host language isn’t the interesting part.