We wish to have a scheduling integration with Slurm.
Our own app has a backend system which will decide the placement of jobs across
hosts & CPU cores. Note that it takes its own time to come back with a placement & slurm
should update it regularly any change in the current state of available resources.
For this we believe we have 3 options broadly:
a> We use the const_tres Select plugin & modify it to let it query our backend system.
b> We write our own Select plugin avoiding any other Select plugin.
c> We use existing select plugin & also register our own plugin. Idea is that our plugin will cater to
our jobs (specific partition say) while all other jobs would be taken up by the default plugin.
Problem with a> is that this leads to modification of existing plugin code & calling (our) library
code from inside Select plugin lib.
With b> the issue is unless we have the full Slurm cluster to ourselves this isn’t viable.
Any insight how to proceed with this? Where should our select plugin, assuming we need to make one, fits in the slurm integration.
We are not sure whether c> is allowed in Slurm.
Went through existing Select plugins Linear & cons_tres. However, not able to figure out how to use them or write something on similar lines to suit our purpose.