Relative Content

Tag Archive for slurm

how to make job distribution to nodes depend on partition

We have a heterogenous cluster with some small nodes (64 cores) in partition_smallnodes and some larger nodes (256 cores) in partition_largenodes.

Multiple sequential tasks per node with slurm batch script

I’m using slurm on a shared compute cluster. I’d like to know if the following is possible:

slurmd error: port already in use, resulting in slaves not being able to communicate with master slurmctld

I’m trying to set up a Slurm (version 22.05.8) cluster consisting of 3 nodes with these hostnames and local IP addresses:

slurmd network port already in use resulting in slurmd on slave nodes not able to communicate with master slurmctld

I’m trying to set up a Slurm cluster consisting of 3 nodes with these hostnames and local IP addresses:

Slurm rest api, plugin 101 not found

When submit or get a job everything works fine, but when i want to hold a job i always get this error in the screeshot.

Can’t submit GRES value from slurm REST API

I have been trying to submit the slurm GRES flag through the REST API however, I couldn’t find a way to do the same through the REST APIs. I am using the parser version 0.0.40

SLURM interactive job assigned to the worker nodes but effectively is running on the login node

when I start an interactive job is activated in one of the worker nodes – I see it in the logs on the terminal and running squeue, but then when I run my commands in the terminal they are using the RAM/CPUs of the login node. I checked it using both htop and glances.

Run ntasks-per-node parallel scripts on a node using slurm

I have access to an HTC. I want to run ntasks-per-node=32 parallel instances of the same python script on 1 node. Here is the slurm submit file at the moment:

A single job with multiple job steps on multiple nodes in parallel

I have the ff sbatch script:

Slurm: A single job with multiple job steps on multiple nodes in parallel