I am writing some code on top of an existing library which uses MPI_THREAD_SERIALIZED
internally. However, I have a need to use MPI_THREAD_MULTIPLE
. For ease of presentation, let’s say each process needs to compute with two threads. Each thread has it’s own duplicate of MPI_COMM_WORLD
so that they can use some non-shared objects concurrently. This works totally fine.
However, the library I am making use of does not anticipate this use case and you have essentially two options, that I can think of, 1) duplicate the object for the use of each thread or 2) ensure that only one thread attempts to use the object at any given time.
These objects can be so large that duplicating them would run me out of memory, so option 2 is the only workable solution.
With all that said, I think the issue can be reduced to, “How can I protect a call to MPI_COMM_WORLD
while using MPI_THREAD_MULTIPLE
?”
Assume that the function foo
may be called from multiple threads at one time. I attempted to solve my issue with a std::mutex
, something like.
int foo1(int thread_id)
{
MPI_Barrier(communicator[thread_id]);
my_mutex.lock();
// Some MPI call which needs to happen over MPI_COMM_WORLD
MPI_Barrier(communicator[thread_id]);
my_mutex.unlock();
}
int foo2(int thread_id)
{
MPI_Barrier(communicator[thread_id]);
my_mutex.lock();
// Some other MPI call which needs to happen over MPI_COMM_WORLD
MPI_Barrier(communicator[thread_id]);
my_mutex.unlock();
}
communicator
is a std::vector<MPI_Comm>
so that each thread ID has its own communicator to use.
Let us assume that thread 0 on each process calls foo1
at the same time and thread 1 on each process calls foo2
at the same time. I thought this would work, but it seems that I get a deadlock of this form: Thread 0 on process 0 has the mutex lock and is doing some MPI_COMM_WORLD
call. Thread 1 on process 0 is waiting for the mutex. Thread 0 on process 1 is waiting on the mutex. Thread 1 on process 1 is doing some other call with MPI_COMM_WORLD
. However, the goal would be for the same thread id to obtain the mutex on each process at the same time.
If it matters, I am launching with mpirun -n 2 -bind-to none ./my_program
.
Chessnerd321 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.