I am unable to find any relevant info regarding running preparation tasks on nodes, I expect this to be a common enough problem that means shouldn’t be trying to create some custom workaround to implement this.
What i’m looking for (and can’t seem to find), is some kind of “prepare” script option for slurm, that would be run once on each node before launching a set of jobs, only on the nodes that the jobs will be allocated to run on.
Does this sort of “prepare” feature exist in slurm?
Here is the scenario I am dealing with: I have a few slurm nodes attached to a Jenkins instance, the Jenkins instance has access to each slurm node, and the slurm jobs that we want to run require some specific files that are generated in the Jenkins flow to be present on each node.
Because each Jenkins job is unique, each slurm node must be prepared during the jenkins job before slurm jobs are dispatched to the slurm nodes. Currently, we use Jenkins to stash the required files, to connect to each jenkins node, unstash the required files (once), and then launch the array of jobs via sbatch. The problem here is that we are 1) using jenkins to prepare each slurm node for the jobs, which feels wrong, and 2) because we don’t know which of the nodes will receive the current jobs, we are wasting resources by preparing all the nodes, then cleaning up all the nodes later, even if not all the nodes will be running the current job.
To better illustrate the point, lets we have two nodes (node1 and node2) with 10 cores each, and i have a 10 job array that uses 1 core per job. I give the array for sbatch to launch, and sbatch decides that it can fit all the jobs on node1. Therefore, it will run the “preparation” script on on node1 only (and not on node2), and it will only run this preparation job once, before launching any of the actual jobs on said node. If, next time i want to dispatch jobs to slurm, i have an array of 15 of said jobs, then the preparation script should be run once.
As an alternative to preparing each nodes, we have a tried using a NAS attached to all those nodes to store all relevant files, but its quite slow to prepare everything on the nas due to lots of small files, and runnign our jobs when some files are on the nas also makes them slow down. Obviously i can come up with other ways to partially work around this preparation problem, like ssh, but then i still don’t know in advance which nodes the jobs will be launched on. And it seems this is such an obvious requirement that there should be an option to do this via slurm nativelly.
toby is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.