Why does ‘_mm256_fmadd_ps’ cause precision loss?
I use _mm256_fmadd_ps
to perform a * b
and accumulate it to the result c
, like c=a*b+c
. It is found that under certain circumstances, fmadd
operations will cause precision loss compared with those that mul
first and then add
, especially when c
already has a non-zero value.