Best way to bundle ALL jobs together into a single Slurm allocation? #181

tjweitzel225 · 2025-01-14T12:33:35Z

My institution's cluster has a strong preference for scientific workflows to be self-contained in a single, long-running SLURM job.

Our current solution is to use Dask's SLURM Runner feature https://jobqueue.dask.org/en/stable/generated/dask_jobqueue.slurm.SLURMRunner.html.

However, I'm thinking of implementing snakemake for its workflow management features.

Is there any way to support these kinds of monolithic SLURM jobs?

Ideally, I'd simply be able to provide Snakemake with resource specifications -- 256 cores across 4 nodes, along with memory specs, say -- and the snakemake scheduler should see those 256 cores and parallelize appropriately as it would on a local multi-core machine.

cmeesters · 2025-01-14T14:02:57Z

Hi,

thanks for approaching us!

My institution's cluster has a strong preference for scientific workflows to be self-contained in a single, long-running SLURM job.

That is, of course, not very sensible: huge scheduling overhead, heterogeneous resource usage of tools leading to waste, etc..

However, a feature like --slurm-pool=<rule_1>[:<k1>],[<rule_2>[:<k2>]], which would allow submitting a number of SMP jobs together, combining their resource requirements, is on my todo.

Ideally, I'd simply be able to provide Snakemake with resource specifications -- 256 cores across 4 nodes, along with memory specs, say -- and the snakemake scheduler should see those 256 cores and parallelize appropriately as it would on a local multi-core machine.

This is something, you can already do using just Snakemake without this plugin inside a job. It is however limited and, if you are developing, rather tedious.

May I ask, at which institution you are working?

tjweitzel225 · 2025-01-14T14:57:38Z

Thank you for the response! Good to know that the slurm-pool feature is considered.

May I ask, at which institution you are working?

It's just a relatively small cluster managed by an R&D focused firm.

This is something, you can already do using just Snakemake without this plugin inside a job. It is however limited and, if you are developing, rather tedious.

That's interesting -- I was attempting to try this inside an sbatch'd job (I was attempting to call srun in the shell field of reach rule to turn the rules into SLURM steps) was running into issues especially when the allocation reached across nodes.

Could you point me to any resources to help me implement this? Is there a short answer to explain why it's limited and tedious? (Short answer just because I don't want to take up too much of your time!)

Thank you for the response.

cmeesters · 2025-01-15T10:43:48Z

It's just a relatively small cluster managed by an R&D focused firm.

Weird. HPC ideology is sometimes particularly odd, yet I would expect a R&D department to be focussed on user needs, cost, and efficiency.

What I meant, and I hope to understand you correctly, that your admins wanted to treat a compute node of a cluster as a single server. Then you can use Snakemake, of course, stand-alone and just tell it how many CPUs are available. To do that within an (interactive) job might mean that debugging is a bit more complicated than if running on a login node and submitting jobs.

Might not be true, though, for if you do

$ salloc <args>
$ srun snakemake <args>

you will still see the output as expected.

However, this plugin allows distributing your workload and make use of an entire cluster (in all its hardware heterogeneity) and is constantly improving.

tjweitzel225 · 2025-01-15T11:34:06Z

$ salloc <args>
$ srun snakemake <args>

Unfortunately I don't think this works across nodes. Because if I try

$ srun -n2 snakemake

, two independent instances of snakemake will be run simultaneously. I.e. you'll see two instances independently build their own DAGs, only seeing $SLURM_CPUS_PER_TASK cores from within each instance.

It appears from this discussion I'll need to implement an internode scheme for coordinating the resources for snakemake. I suppose I'll try invoking mpi4py in the Snakefile next.

cmeesters · 2025-01-15T13:52:00Z

Yes, of course, that is what -n is supposed to do under SLURM. And this is the caveat. You can do:

$ salloc -N1 <args>
$ # or
$ salloc -c ${cpu_number_of_full_node}

to reserve a full node, whatever works on your cluster. But then you are limited to exactly this node with its c-group. When using the executor, you can spawn different jobs, e.g. a single core script, a threaded program (this is what you could do on a single node, reserved this way, too), a MPI program (this won't work properly with an salloc -n, which you do implicitly, when omitting the -n flag), a GPU job, ... .

I would strongly advise talking to your admins: Scheduling on a cluster works best, if everyone behaves and the scheduler has the task to allocate space for hetergenous tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to bundle ALL jobs together into a single Slurm allocation? #181

Best way to bundle ALL jobs together into a single Slurm allocation? #181

tjweitzel225 commented Jan 14, 2025

cmeesters commented Jan 14, 2025

tjweitzel225 commented Jan 14, 2025 •

edited

Loading

cmeesters commented Jan 15, 2025

tjweitzel225 commented Jan 15, 2025

cmeesters commented Jan 15, 2025

Best way to bundle ALL jobs together into a single Slurm allocation? #181

Best way to bundle ALL jobs together into a single Slurm allocation? #181

Comments

tjweitzel225 commented Jan 14, 2025

cmeesters commented Jan 14, 2025

tjweitzel225 commented Jan 14, 2025 • edited Loading

cmeesters commented Jan 15, 2025

tjweitzel225 commented Jan 15, 2025

cmeesters commented Jan 15, 2025

tjweitzel225 commented Jan 14, 2025 •

edited

Loading