Why my hybrid mpi/openmp code does not scaling well with more mpi process?
I have tested my hybrid mpi/openmp code,
how to divide my mpi process and open mp job over nodes using salloc?
I am trying to submit a job using salloc, i.e., 64 MPI process and each process has 4 openmp threads:
Scalability of openmp
I have a piece of openmp code for testing scaling of mutithreads: