how to make job distribution to nodes depend on partition
We have a heterogenous cluster with some small nodes (64 cores) in partition_smallnodes and some larger nodes (256 cores) in partition_largenodes.
Multiple sequential tasks per node with slurm batch script
I’m using slurm on a shared compute cluster. I’d like to know if the following is possible:
slurmd error: port already in use, resulting in slaves not being able to communicate with master slurmctld
I’m trying to set up a Slurm (version 22.05.8) cluster consisting of 3 nodes with these hostnames and local IP addresses:
slurmd network port already in use resulting in slurmd on slave nodes not able to communicate with master slurmctld
I’m trying to set up a Slurm cluster consisting of 3 nodes with these hostnames and local IP addresses:
Slurm rest api, plugin 101 not found
When submit or get a job everything works fine, but when i want to hold a job i always get this error in the screeshot.
Can’t submit GRES value from slurm REST API
I have been trying to submit the slurm GRES flag through the REST API however, I couldn’t find a way to do the same through the REST APIs. I am using the parser version 0.0.40
SLURM interactive job assigned to the worker nodes but effectively is running on the login node
when I start an interactive job is activated in one of the worker nodes – I see it in the logs on the terminal and running squeue, but then when I run my commands in the terminal they are using the RAM/CPUs of the login node. I checked it using both htop and glances.
Run ntasks-per-node parallel scripts on a node using slurm
I have access to an HTC. I want to run ntasks-per-node=32
parallel instances of the same python script on 1 node. Here is the slurm
submit file at the moment:
A single job with multiple job steps on multiple nodes in parallel
I have the ff sbatch script:
Slurm: A single job with multiple job steps on multiple nodes in parallel
I have the ff sbatch script: