On the HPC I work with, I have to use docker in rootless-mode by calling a script start-docker.sh
on every job.
To automate this, I would like to use the --task-prolog
argument of srun
and have the script called from a task_prolog.sh
script:
#!/usr/bin/env sh
nohup start-docker.sh > docker-server.log 2&>1 &
With start-docker.sh
responsible to launch dockerd-rootless.sh
:
#!/usr/bin/env bash
source /tmp/slurm/username/env.txt
dockerd-rootless.sh --host=$DOCKER_HOST --data-root=/storage/docker-data/username/ --exec-root=${XDG_RUNTIME_DIR}/
The SLURM documentation for prolog scripts shows examples of setting environment variables or printing messages but nothing for executing a script.
Using this approach, despite the start-docker.sh
script being launched, no docker-server is running.
Inspection of docker-server.log
shows that the dockerd-rootless.sh
script is called too, but seems ultimately interrupted when calling rootlesskit
as it is the last command logged:
+ exec rootlesskit --state-dir=/run/user/8076/dockerd-rootless --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh --host=unix:///run/user/8076/docker.sock --data-root=/storage/docker-data/username/ --exec-root=/run/user/8076/
Nonetheless, when start-docker.sh
is called manually, additional lines are logged after the one from rootlesskit
and the docker server runs as expected.
I don’t understand why the script seems interrupted. Is the process running the prolog script killed at some point?
Thanks for your help.
4