Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threads directive of a rule not taken into account when running through slurm #141

Open
blaiseli opened this issue Aug 30, 2024 · 3 comments

Comments

@blaiseli
Copy link

Software Versions

$ pip list | grep snakemake
snakemake                                 8.19.0
snakemake-executor-plugin-cluster-generic 1.0.9
snakemake-executor-plugin-slurm           0.10.0
snakemake-executor-plugin-slurm-jobstep   0.2.1
snakemake-interface-common                1.17.3
snakemake-interface-executor-plugins      9.2.0
snakemake-interface-report-plugins        1.0.0
snakemake-interface-storage-plugins       3.3.0
$ sinfo --version
slurm 23.02.6

Describe the bug

In a rule having threads set to 2, a shell command built to display {threads} reports only 1 thread when snakemake is run through slurm using sbatch.

Minimal example

Here is a short example meant to compare the above with what happens when setting ncpus_per_task to 2 in resources.

$ cat src/workflow/test.smk
rule all:
    input:
        "test/test_threads.out",
        "test/test_resources.out"

rule test_threads:
    output: "test/test_threads.out"
    threads: 2
    run:
        cmd = f"echo {threads} > {output}"
        shell(cmd)

rule test_resources:
    output: "test/test_resources.out"
    resources:
        cpus_per_task = 2
    run:
        cmd = f"echo {resources.cpus_per_task} > {output}"
        shell(cmd)

I run it through sbatch using the following script:

$ cat src/run_test.sh
#!/bin/bash

source .venv/bin/activate

profile="src/profile/slurm"
snakefile="src/workflow/test.smk"
snakemake --version

mkdir -p test

cmd="snakemake -s ${snakefile} \
    --executor slurm
    --profile ${profile} \
    $@"

>&2 sbatch --qos="hubbioit" --partition="hubbioit" --parsable \
    -J run_test \
    --mem=10G \
    -o test/test.o \
    -e test/test.e \
    ${cmd}

exit 0

Running it:

$ ./src/run_test.sh
8.19.0
20715805

Looking at the output:

$ cat test/test_resources.out 
2
$ cat test/test_threads.out 
1

If I run the workflow without sbatch and slurm, both output files contain "2".

Additional context

This looks like something similar to what is described here: #113 (comment)

However, if I understand correctly, this aspect of #113 is supposed to be solved by #137 which is included in 0.10.0

In case this is relevant, here is the config.yaml of the slurm profile given to --profile:

$ cat src/profile/slurm/config.yaml
# Manually edited according to https://github.com/Snakemake-Profiles/slurm/issues/117#issuecomment-1906448548
cluster-generic-sidecar-cmd: "slurm-sidecar.py"
#cluster-sidecar: "slurm-sidecar.py"
#cluster-cancel: "scancel"
cluster-generic-cancel-cmd: "scancel"
restart-times: "3"
jobscript: "slurm-jobscript.sh"
#cluster: "slurm-submit.py"
cluster-generic-submit-cmd: "slurm-submit.py"
#cluster-status: "slurm-status.py"
cluster-generic-status-cmd: "slurm-status.py"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"
local-cores: 1
latency-wait: "240"
use-conda: "False"
use-singularity: "False"
jobs: "144"
printshellcmds: "False"
# end with comments only
@CarstenBaker
Copy link

#137 sadly never worked to fix the issue for us
If you try running the sbatch command with 2 (or greater) cpu's specified (assume your default is 1 cpu) both threads should be 2.

We have been setting both cpus_per_task and threads in the slurm config file (matching the totals) for the moment as a workaround. It's a bit of a duplication but can't find a reliable way to link the totals together. As long as the cpus_per_task are the same or greater than threads than the threads command works for the correct total. You need to specify both as snakemake rules run on threads, the dry run totals are also incorrect so have to check the logs or slurm logs for correct threads/totals.

If you run from head node the threads will work correctly but we don't like doing this and prefer using sbatch.

@tdido
Copy link
Member

tdido commented Sep 20, 2024

@blaiseli I think your --profile may be interfering with the plugin. Did you try running without it? Or at least removing everything above the max-jobs-per-second: "10" line.

@blaiseli
Copy link
Author

blaiseli commented Oct 3, 2024

@tdido I just saw you suggestion and tried not using --profile on a workflow I'm currently setting up, and this didn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants