I have tested my hybrid mpi/openmp code,
First, I test with 2 mpi process, and different openmp threads
salloc -p compute-grantley --exclusive mpirun -n 2 --bind-to none ./test
The resutls are the following
salloc: Nodes node012 are ready for job
1 threads: 1.7
2 threads: 0.86
4 threads: 0.44
8 threads: 0.22
Here the scaling with respect to number of threads is doing well.
However, when I use 16 mpi process, with different openmp threads:
salloc -p compute-grantley --exclusive mpirun -n 2 --bind-to none ./test
The resutls are the following
salloc: Nodes node057 are ready for job
1 threads: 0.20
2 threads: 0.21
4 threads: 0.22
8 threads: 0.20
Here, the scaling with respect to number of threads does not work well. No scaling is observed.
Why do I have one node assigned for both 2 and 16 mpi process? In this situation, in order to get good scaling with respect to both number of mpi process and openmp threads, how to set the correct command ?