I have a piece of openmp code for testing scaling of mutithreads:
void openmp_scalabity_test ()
{
int tid ;
omp_set_num_threads(numberofthreads); // here I specify the number of threads:
double t0, t1;
#pragma omp parallel private(tid, t0, t1)
{
t0 = MPI_Wtime();
#pragma omp for
for(std::size_t zindex=0; zindex<10000000000; zindex++)
{
tid = omp_get_thread_num();
}
t1 = MPI_Wtime();
#pragma omp barrier
if(tid==0)
{
std::cout <<" Multithread wall clock: "<<t1-t0<<" in threads: " << omp_get_thread_num()<<std::endl;
}
}
}
I run the code by mpi,
mpirun --bind-to none -np 1 ./test
My CPU info as follows:
rchitecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i9-12900K
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 17%
CPU max MHz: 5200.0000
CPU min MHz: 800.0000
BogoMIPS: 6374.40
Here is bit confusing for me. It seems I have 16 cores, and each core has 2 threads. I expect that I can get good scaling up to 32 threads.
However, here is the time for the test:
1 threads: 3.93
2 threads: 2.01
4 threads: 1.04
8 threads: 0.56
16 threads: 0.58
32 threads: 0.63
It seems I can only get good scaling up to 8 threads.