I had some unexpected result from openmp parallel simd reduction on Apple M1
with Clang 17.0.6
and -O3
, which I never met with gcc
. Eventually the problem reduces to the following example,
std::size_t size = 1024;
std::vector<double> vec(size);
double total = 0.0;
#pragma omp parallel for simd reduction(+ : total) schedule(static)
for (std::size_t itr_x = 0; itr_x < size; ++itr_x)
{
double val = 1;
vec[itr_x] = val; // without this line `total` gives 1024 as expected
total += val;
}
std::cout << total << std::endl; // 128 instead of 1024
It seems the assignment vec[itr_x] = val;
introduces some side effects to the reduction across simd lanes. The result does not depend on the number of threads. total
is still 128 even with num_threads(1)
, but changes with different simdlen()
. Without -O3
, total
gives 1024 as expected.
What I want to know is: is it legal to do both store and reduction in the parallel simd loop? or it is an issue with my system or compiler?
ycqiang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.