This project aims to display the GPUs available in Ibex and IVUL cluster.
This was born as a solo project to greedily&manually pick the best cluster to run experiments.
Requirements: python-3, pandas
-
Login to the cluster of interest (ibex/skynet) and run
-
Launch servers
-
@skynet
conda activate cluster_status FLASK_PORT=5000; FLASK_APP=server.py flask run --port=$FLASK_PORT
-
@ibex
python gdragon.py
-
-
(Optional) Make server accessible if ports are blocked
Should be as simple as
ssh -vR 8000:localhost:5000 [user-name]@[server]
-
Install miniconda or anaconda.
Skip this step if you already have
-
Create the environment
conda env create -f environment-x86_64.yml
That's all. Don't forget to activate the environment before running any program.
conda activate cluster-status
Currently, all the heavy lifting is done in cluster.py.
This module simply retrieves status of the cluster from SLURM via subprocess
/shell calls.
We recommend to navigate the code from server.py to get an idea was going on. The most important functions are partially documented. You can also reach us, and contribute with more documentation.
Grab info about nodes
sinfo -o "%n %A %D %P %T %c %z %m %d %w %f %G"
TODO: explain what all those %?
means. low-priority in favor of using this.
Behind scenes combines cluster info and squeue -o "%u %i %t %b %N"
TODO: explain what all those %?
means. low-priority in favor of using this.
- Show reservation
scontrol show reservation | grep -A 3 GROUP_IVUL
- List node info
scontrol -o show node
Implement the feats described in issues #9, #5 .
-
Get users
sacctmgr list users --noheader format=User%-20
-
Get gres list
scontrol show config | grep -e "GresTypes"
-
Get partitions list
scontrol show partitions | grep PartitionName
-
List of unaveilable nodes
sinfo -N --states=DOWN,DRAIN,DRAINED,DRAINING -o \"%N\" --noheader
-
List of nodes or nodes in given partition
sinfo -h -o %n
sinfo -h -p $partition_list -o %n
-
Extract computer info
scontrol show nodes --oneliner --detail | sed 's/\\s/\\n/g' | grep -e "NodeName=" -e "Gres=" -e "GresUsed" -e "CfgTRES=" -e "AllocTRES=" -e "Partitions="
-
List jobs
scontrol show jobs --oneliner --detail | grep "JobState=RUNNING" | sed 's/\\s/\\n/g' | grep -e "JobId" -e "NumNodes" -e "ArrayJobId" -e "ArrayTaskId" -e "JobName" -e "UserId" -e "StartTime" -e "Partition" -e "^Nodes=" -e "CPU_IDs" -e "Mem=" -e "Gres=" -e "TRES=" -e "TresPerNode="
Tested in ibex.
Gres
did not work instead @escorciav foundTresPerNode
orGRES_IDX
.
Credits to situpf