I’m using slurm on a shared compute cluster. I’d like to know if the following is possible:
I have 40 tasks and 5 nodes. Each node has four cores. I’d like to allocate 8 tasks to each node BUT I want each node to only run one task at a time. Each task is demanding on memory and running more than one at a time will exceed the capacity of each node’s memory. I know this is inefficient but it is the only solution available for my workload (see bottom for more details).
I have the following sbatch script. Currently, if I run it I get an error about not having an available configuration for the given requirements (8 tasks per node when each node only has 4 cores).
#!/bin/bash
#SBATCH --job-name=elm_x
#SBATCH --output=log/%x-%A_%a.out
#SBATCH --error=log/%x-%A_%a.err
#SBATCH --nodes=5
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=8
srun --mpi=pmix /usr/bin/ElmerSolver_mpi case.sif
Workload details
I’m using FEA software (ElmerFEM). If you make use of message passing, you split up the input mesh into a number of partitions. This number of partitions must match the number of tasks set in slurm otherwise the software will not accept the input mesh.