Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to bundle ALL jobs together into a single Slurm allocation? #181

Open
tjweitzel225 opened this issue Jan 14, 2025 · 5 comments
Open

Comments

@tjweitzel225
Copy link

My institution's cluster has a strong preference for scientific workflows to be self-contained in a single, long-running SLURM job.

Our current solution is to use Dask's SLURM Runner feature https://jobqueue.dask.org/en/stable/generated/dask_jobqueue.slurm.SLURMRunner.html.

However, I'm thinking of implementing snakemake for its workflow management features.

Is there any way to support these kinds of monolithic SLURM jobs?

Ideally, I'd simply be able to provide Snakemake with resource specifications -- 256 cores across 4 nodes, along with memory specs, say -- and the snakemake scheduler should see those 256 cores and parallelize appropriately as it would on a local multi-core machine.

@cmeesters
Copy link
Member

Hi,

thanks for approaching us!

My institution's cluster has a strong preference for scientific workflows to be self-contained in a single, long-running SLURM job.

That is, of course, not very sensible: huge scheduling overhead, heterogeneous resource usage of tools leading to waste, etc..

However, a feature like --slurm-pool=<rule_1>[:<k1>],[<rule_2>[:<k2>]], which would allow submitting a number of SMP jobs together, combining their resource requirements, is on my todo.

Ideally, I'd simply be able to provide Snakemake with resource specifications -- 256 cores across 4 nodes, along with memory specs, say -- and the snakemake scheduler should see those 256 cores and parallelize appropriately as it would on a local multi-core machine.

This is something, you can already do using just Snakemake without this plugin inside a job. It is however limited and, if you are developing, rather tedious.

May I ask, at which institution you are working?

@tjweitzel225
Copy link
Author

tjweitzel225 commented Jan 14, 2025

Thank you for the response! Good to know that the slurm-pool feature is considered.

May I ask, at which institution you are working?

It's just a relatively small cluster managed by an R&D focused firm.

This is something, you can already do using just Snakemake without this plugin inside a job. It is however limited and, if you are developing, rather tedious.

That's interesting -- I was attempting to try this inside an sbatch'd job (I was attempting to call srun in the shell field of reach rule to turn the rules into SLURM steps) was running into issues especially when the allocation reached across nodes.

Could you point me to any resources to help me implement this? Is there a short answer to explain why it's limited and tedious? (Short answer just because I don't want to take up too much of your time!)

Thank you for the response.

@cmeesters
Copy link
Member

It's just a relatively small cluster managed by an R&D focused firm.

Weird. HPC ideology is sometimes particularly odd, yet I would expect a R&D department to be focussed on user needs, cost, and efficiency.

What I meant, and I hope to understand you correctly, that your admins wanted to treat a compute node of a cluster as a single server. Then you can use Snakemake, of course, stand-alone and just tell it how many CPUs are available. To do that within an (interactive) job might mean that debugging is a bit more complicated than if running on a login node and submitting jobs.

Might not be true, though, for if you do

$ salloc <args>
$ srun snakemake <args>

you will still see the output as expected.

However, this plugin allows distributing your workload and make use of an entire cluster (in all its hardware heterogeneity) and is constantly improving.

@tjweitzel225
Copy link
Author

$ salloc <args>
$ srun snakemake <args>

Unfortunately I don't think this works across nodes. Because if I try

$ srun -n2 snakemake

, two independent instances of snakemake will be run simultaneously. I.e. you'll see two instances independently build their own DAGs, only seeing $SLURM_CPUS_PER_TASK cores from within each instance.

It appears from this discussion I'll need to implement an internode scheme for coordinating the resources for snakemake. I suppose I'll try invoking mpi4py in the Snakefile next.

@cmeesters
Copy link
Member

Yes, of course, that is what -n is supposed to do under SLURM. And this is the caveat. You can do:

$ salloc -N1 <args>
$ # or
$ salloc -c ${cpu_number_of_full_node}

to reserve a full node, whatever works on your cluster. But then you are limited to exactly this node with its c-group. When using the executor, you can spawn different jobs, e.g. a single core script, a threaded program (this is what you could do on a single node, reserved this way, too), a MPI program (this won't work properly with an salloc -n, which you do implicitly, when omitting the -n flag), a GPU job, ... .

I would strongly advise talking to your admins: Scheduling on a cluster works best, if everyone behaves and the scheduler has the task to allocate space for hetergenous tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants