-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select partition automatically #106
Comments
Hi, Sounds like an interesting consideration. However, the shown link does not show such a selector - or does it? Please consider: The design of this plugin has an emphasis on stability. Introducing some sort of heuristic would inexplicitly lead to failures here or there. |
My script is located here: GitHub. I agree it is a bit heuristic, but one should be able to get the Slurm partitions with sinfo or similar commands. Currently, if I submit a high-memory job/rule to Slurm via this tool, it is submitted to the default queue and is never scheduled because the default queue doesn't allow high-memory jobs. I can define queues in Snakemake for my cluster, but how can I ensure that it will also work on another HPC? |
BTW, your plugin is awesome and stable! |
Stupid me! I missed the crucial part in your code. Now, currently Snakemake users have three points to configure their specs: The command line, the workflow itself or a workflow profile. We recommend not using the workflow files because all workflows should be kept portable. Within the profiles, we can define defaults and deviating resource specs per rule (see here). Would we offer a csv/tsv file with partition specs in addition, users could skip the partition selection in their profiles, as the resources still need to be defined. This would display a minor gain at the cost of needing an additional config file. Do I get this right? This configuration scheme, while flexible, is only fine, with a few rules requiring more or different resources than in our default. So, I too would very much prefer some automatism because resource definitions can get quite cumbersome. With Read this as a confession: I am overwhelmed by the potential points of failure and don't dare to attempt to implement this. However, I have some small ideas for improvement and hope to get a project on track in the not-so-near-future. If you would meanwhile submit a PR, we would be happy to consider it - even if we cannot merge it into the production line for a while. |
I'll start with some general thoughts on if and how to select the partition / queue automatically. This is mostly from experience with using an LSF cluster system, but I think this generalizes. Cluster systems can always be configured heavily, and oftentimes it is not straightforward to determine configurations. So with queues / partitions with arbitrary names and all kinds of possible resource limitations and requirements, I think this is not something easily automatable via the resprective executor plugin. So I think I would first go for optimizing things via profiles, and think more about generalizing such things, once we all have more experience with dynamic queue / partition allocations... And then I'll just cross-post my resprective thoughts on profiles from this Discord discussion, to keep this documented here: As these kinds of specs are usually very specific to your local cluster setup, I think using profiles is a good idea: Depending on the level of control you have and how generically it should be used, you can put a profile in three different locations:
Note: if you put the config.yaml file in a folder different from default/, you'll have to provide that profile name to the --profile or --workflow-profile command-line argument. I now regularly use user-specific profiles, to for example set the queue depending on the input file size, mostly on an LSF cluster system. So the config.yaml file will contain something like this:
Note the parentheses around the expression. These are needed, so that snakemake properly evaluates the expression in the right step. |
Thank you, David! Though, any user of a complex workflow still needs to come up with a profile. What you wrote stresses the need to update the docs (a bit), rather than implying code changes. Auto-parameterization for clusters remains a dream ;-). |
Yeah, I'm definitely for cleaning up and improving the docs a bit regarding profiles and dynamic resource settings. Learned a lot through trial-and-error in the past weeks and months, and this was always on the list to do at some point. So maybe that point is now... |
Would it be possible to select the queue by resources.mem_mb argument dynamically? |
Is there a way to make a quite complicated if else statement using multiple criteria? |
Like I wrote, implementing such a feature should not stop there. This is not done easily for a submit plugin.
Absolutely, this is just Python code. @dlaehnemann wouldn't is be a nice feature to be able to point |
Yes, you can basically chain things with the syntax I gave above. Or test other kinds of syntax. From my understanding, the only limitation is that it has to be a one-liner. But that is really flexible in Python...
The signature for the callable that you can use in dynamic resource allocation currently is:
So
But that is already pretty much what And as mentioned above, and judging from our discussion (and my personal experience, as well): we really need to improve the Snakemake documentation on dynamic resource allocation! |
@dlaehnemann yep, except that it calls for nested directories with specific names (which is fine and probably better than cluttering files in one directory) - I was just thinking in the wrong line, probably would have faced that issue, if not totally overworked already, sooner. Hm, that
should it be part of the global documentation rather than the executor doc? I am currently working on an update to the executor doc. Ultimately, going for a major doc update in the central docs, calls for an coordinated effort. |
Most of this belongs in the global docs, only executor-specific stuff (like how you define queues et. al) should go into the executor docs, I'd say. |
Hi, thanks again for the executor plugin. I would love to have all variables from rules e.g., especially Thanks for your consideration & Cheers, Stephan |
@dlaehnemann, maybe I misunderstand, but I am using default profile configs in each of my workflows, but when I load them as module and run snakemake they are not used or even found (only the global profile). Can you help/explain?
|
Hello, I would like to that the wrapper can select the partition automatically based on the resources.
I implemented something in https://github.com/Snakemake-Profiles/generic
It loads a csv table and selects the appropriate profile.
Do you think something like this could be implemented?
The text was updated successfully, but these errors were encountered: