I am trying to fill a std::unordered_map
in parallel (OpenMP). A very simplified version of my current approach would be as follows:
std::unordered_map<unsigned, std::vector<double>> data;
unsigned dim, nTasks, largeNum;
#pragma omp parallel
{
#pragma omp single
{
for (unsigned task=0; task < nTasks; task++)
{
#pragma omp task shared(data) firstprivate(task)
{
for (unsigned it=0; it < largeNum; it++)
{
unsigned idx1 = foo(task, it);
unsigned idx2 = bar(task, it);
double value = fooBar(task, it);
if ( !data.count(idx1) )
{
#pragma omp critical
{
if (!data.count(idx1)) // Guard
data.emplace(idx1, std::vector<double>(dim, 0.0));
}
}
// The following throws out_of_range
#pragma omp atomic
data.at(idx1)[idx2] += value;
}
}
}
}
}
When executed in parallel, the code throws std::out_of_range
after some (seemingly) random number of steps. The code claims that data[idx1]
does not exist, but it is explicitly constructed in the critical section preceding the throwing line. What is wrong about this code and why does it throws the exception? Thank you very much for your help!
Some notes:
- I’ve tried the same code with
map
instead ofunordered_map
and in this case the code did not throw the exception. - I perform the check
if ( !data.count(idx1) )
twice, the first time to avoid entering a critical section every time and the second time to make sure that no other thread created the elementdata[idx1]
while the current thread entered the critical section. - I am aware of the typical approach of creating a private copy of
data
for each thread, filling it locally, and then add the results to the globaldata
inside a critical section. This approach is, however, not feasible in my case, sincedata
is too large (tens or hundreds of GB) and there is no easy way to partition it such that a given thread only needs to write to a portion of it.