I have a question about writing SLURM directives for Python multiprocessing codes to be run on high-performance computing (HPC) platforms. If I want to set the number of processes in the code to 8, should I set ‘ntasks=8’ or ‘cpus-per-task=8’ in the SLURM directive set-up? Some people say that ‘cpus-per-task’ refers to the number of threads, whereas Python multiprocessing does not use multithreading method due to Global Interpreter Lock (GIL).
I am confident that Python multiprocessing definitely works. Because I tried it on my local laptop and I saw a decrease in computing time with the increased number of processes. So, the question is how to set up the SLURM directives correctly in order for codes using multiprocessing to be run on HPC.
Here is an example code called ‘multiproc1.py’:
import multiprocessing as mp
import time
import numpy as np
import sys
def square(x):
# Three lines below are added to increase the computation time for each process
random_matrix = np.random.randn(150,150)
eigenvalues = np.linalg.eigvals(random_matrix)
a = np.max(np.abs(eigenvalues))
return x * x
if __name__ == '__main__':
ntasks = int(sys.argv[1]) # for command line passing
print('hello world')
start_time = time.time()
pool = mp.Pool(ntasks)
outputs_async = pool.map_async(square, [ki for ki in range(10000)])
outputs = outputs_async.get()
end_time = time.time()
print(end_time-start_time)
Here is the SLURM file to run the above code on a HPC platform.
#!/bin/bash
#SBATCH --partition=cascade
#SBATCH --nodes=1
#SBATCH --ntasks=2
module purge
module load foss/2022a
module load Python/3.10.4
module load SciPy-bundle/2022.05
srun python3 multiproc1.py $SLURM_NTASKS
I tried ‘ntasks=2,4,8’ and they all took around the same time to finish. Here is the output when I tried ‘ntasks=2’:
hello world
hello world
98.75774931907654
102.18375992774963
It outputs ‘hello world’ twice. However, it is supposed to print it only once (when I tried on my local laptop). I suspect that each task just runs the whole code on its own without even doing the parallel programming. I also tried setting ‘cpus-per-task’ (which is the correct directive to set for ‘parfor’ in Matlab, I tried), however, it took longer time to finish a job. I guess ‘cpus-per-task’ refers to multithreading and Python multiprocessing does not use multithreading method.
I tried this code on my local laptop and varied the number of processes (1,2,3,4). For 1 process, it took around 89 seconds; for 2 processes, it took around 57 seconds; for 3 processes, it took around 41 seconds; for 4 processes, it took around 36 seconds. So, I am confident that multiprocessing definitely works on my local laptop. I reckon multiprocessing should also work on HPC platforms.
So, the question is: how to write the SLURM directives correctly for the codes using multiprocessing to be run on HPC platforms?
Tony Ding is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.