You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When running the slurm_par example, the runs step fails for each sample ran due to a slurm allocation issue. The following error is placed inside each runs.slurm.err file that's generated: srun: error: Only allocated 1 nodes asked for 2.
To Reproduce
Steps to reproduce the behavior:
Pull the slurm_par example with merlin example slurm_par
Cd into the slurm/ directory
Queue the tasks with merlin run slurm_par.yaml
Run the workers with merlin run-workers slurm_par.yaml
When it's done running look in the output directory at runs/00/runs.slurm.err to see the error
Expected behavior
We want two nodes allocated with slurm for this step.
Please answer these questions to help us pinpoint the problem
Does the problem occur in merlin run --local mode, distributed mode or neither? Distributed
If a distributed problem, which backend and queue servers are you using? How are they configured? Broker is rabbitmq, results backend is redis. Configured through LaunchIT
On what machines/architectures are you running merlin? Is this bug on a specific machine or can you reproduce it elsewhere? rztopaz and reproduced on ruby
Additional context
Bug found by Casey Lamarche
The text was updated successfully, but these errors were encountered:
Bug Report
Description
When running the slurm_par example, the runs step fails for each sample ran due to a slurm allocation issue. The following error is placed inside each runs.slurm.err file that's generated:
srun: error: Only allocated 1 nodes asked for 2
.To Reproduce
Steps to reproduce the behavior:
merlin example slurm_par
slurm/
directorymerlin run slurm_par.yaml
merlin run-workers slurm_par.yaml
runs/00/runs.slurm.err
to see the errorExpected behavior
We want two nodes allocated with slurm for this step.
Please answer these questions to help us pinpoint the problem
merlin run --local
mode, distributed mode or neither? DistributedAdditional context
Bug found by Casey Lamarche
The text was updated successfully, but these errors were encountered: