I have access to an HTC. I want to run ntasks-per-node=32
parallel instances of the same python script on 1 node. Here is the slurm
submit file at the moment:
#!/bin/bash
#SBATCH --job-name=parallel_jobs
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --time=168:00:00
#SBATCH --output=output_%j.out
#SBATCH --partition=sbatch
cd $SLURM_SUBMIT_DIR
source /path/to/python/env
for i in $(seq 1 $SLURM_NTASKS_PER_NODE); do
srun --exclusive --ntasks=1 ./script.py &
done
wait
I can submit this and it runs, as verified by squeue
. However, when I ssh
into the node and run htop
, I can see that there is only a single instance of script.py
running. I prefer to do this through slurm
rahter than use the python multiprocessing capability.