You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HTCondor users, I need your help to add support for HTCondor to availableCores():
HPC schedulers such as Slurm, SGE, and Torque/PBS set environment variables that can be queried to figure out how many CPU cores the scheduler has alloted to the job. This allows the job script to to be agile to what it is allowed to run. For example, when submitting a SGE job to use four (4) cores:
$ qsub -pe smp 4 my_script.sh
the my_script.sh script knows how many cores it got by:
ncores=${NSLOTS:-1}echo"I am allowed to use $ncores cores on this machine"
Question: How do you achieve the same on HTCondor? Does HTCondor set environment variables in a similar way, or are there other ways to query the number of cores you've been assigned?
HTCondor sets several additional environment variables for each executing job that may be useful for the job to reference.
_CONDOR_SCRATCH_DIR gives the directory where the job may place temporary data files. This directory is unique for every job that is run, and its contents are deleted by HTCondor when the job stops running on a machine, no matter how the job completes.
_CONDOR_SLOT gives the name of the slot (for SMP machines), on which the job is run. On machines with only a single slot, the value of this variable will be 1, just like the SlotID attribute in the machine's ClassAd. This setting is available in all universes. See section 3.7.1 for more details about SMP machines and their configuration.
CONDOR_VM equivalent to _CONDOR_SLOT described above, except that it is only available in the standard universe. NOTE: As of HTCondor version 6.9.3, this environment variable is no longer used. It will only be defined if the ALLOW_VM_CRUFT configuration variable is set to True.
X509_USER_PROXY gives the full path to the X.509 user proxy file if one is associated with the job. Typically, a user will specify x509userproxy in the submit description file. This setting is currently available in the local, java, and vanilla universes.
_CONDOR_JOB_AD is the path to a file in the job's scratch directory which contains the job ad for the currently running job. The job ad is current as of the start of the job, but is not updated during the running of the job. The job may read attributes and their values out of this file as it runs, but any changes will not be acted on in any way by HTCondor. The format is the same as the output of the condor_q -l command. This environment variable may be particularly useful in a USER_JOB_WRAPPER.
_CONDOR_MACHINE_ADis the path to a file in the job's scratch directory which contains the machine ad for the slot the currently running job is using. The machine ad is current as of the start of the job, but is not updated during the running of the job. The format is the same as the output of the condor_status -l command.
_CONDOR_JOB_IWD is the path to the initial working directory the job was born with.
_CONDOR_WRAPPER_ERROR_FILE is only set when the administrator has installed a USER_JOB_WRAPPER. If this file exists, HTCondor assumes that the job wrapper has failed and copies the contents of the file to the StarterLog for the administrator to debug the problem.
CONDOR_IDS overrides the value of configuration variable CONDOR_IDS, when set in the environment.
CONDOR_ID is set for scheduler universe jobs to be the same as the ClusterId attribute
The text was updated successfully, but these errors were encountered:
@fboehm, I see you're suggesting parallelly::availableCores() and you've got a vignette on how to use your qtl2pleio package with HTCondor 👍 Do you happen to know the answer to the above HTCondor-specific questions? I don't have access to HTCondor, so I need help to add support for HTCondor to availableCores().
I don't have an HTCondor setup handy to test, the docs say:
CUBACORESGOMAXPROCSJULIA_NUM_THREADSMKL_NUM_THREADSNUMEXPR_NUM_THREADSOMP_NUM_THREADSOMP_THREAD_LIMITOPENBLAS_NUM_THREADSROOT_MAX_THREADSTF_LOOP_PARALLEL_ITERATIONSTF_NUM_THREADS are set to the number of cpu cores provisioned to this job. Should be at least RequestCpus, but HTCondor may match a job to a bigger slot. Jobs should not spawn more than this number of cpu-bound threads, or their performance will suffer. Many third party libraries like OpenMP obey these environment variables.
@HenrikBengtsson - I'm so sorry that I missed this message (from 3 years ago!) until now. @lmichael107@CHTC has a lot of HTCondor experience, and she may be able to connect us with others at U. Wisconsin-Madison who might also have answers to some of the above HT Condor questions. I regret that I'm clueless here. My past uses of HT Condor were pretty crude in the sense that I don't think I ever understood the HT Condor variables and how to integrate them with R package functions, especially when thinking about the availableCores function.
HTCondor users, I need your help to add support for HTCondor to
availableCores()
:HPC schedulers such as Slurm, SGE, and Torque/PBS set environment variables that can be queried to figure out how many CPU cores the scheduler has alloted to the job. This allows the job script to to be agile to what it is allowed to run. For example, when submitting a SGE job to use four (4) cores:
the
my_script.sh
script knows how many cores it got by:Question: How do you achieve the same on HTCondor? Does HTCondor set environment variables in a similar way, or are there other ways to query the number of cores you've been assigned?
FWIW, I tried to search the web for how to do it, but I failed to find anything useful. The closest I found is in Section 2.5.11 of https://www.mn.uio.no/ifi/tjenester/it/hjelp/beregninger/htcondor/condor-manual.pdf:
HTCondor sets several additional environment variables for each executing job that may be useful for the job to reference.
_CONDOR_SCRATCH_DIR
gives the directory where the job may place temporary data files. This directory is unique for every job that is run, and its contents are deleted by HTCondor when the job stops running on a machine, no matter how the job completes._CONDOR_SLOT
gives the name of the slot (for SMP machines), on which the job is run. On machines with only a single slot, the value of this variable will be1
, just like theSlotID
attribute in the machine's ClassAd. This setting is available in all universes. See section 3.7.1 for more details about SMP machines and their configuration.CONDOR_VM
equivalent to_CONDOR_SLOT
described above, except that it is only available in the standard universe. NOTE: As of HTCondor version 6.9.3, this environment variable is no longer used. It will only be defined if theALLOW_VM_CRUFT
configuration variable is set toTrue
.X509_USER_PROXY
gives the full path to the X.509 user proxy file if one is associated with the job. Typically, a user will specify x509userproxy in the submit description file. This setting is currently available in the local, java, and vanilla universes._CONDOR_JOB_AD
is the path to a file in the job's scratch directory which contains the job ad for the currently running job. The job ad is current as of the start of the job, but is not updated during the running of the job. The job may read attributes and their values out of this file as it runs, but any changes will not be acted on in any way by HTCondor. The format is the same as the output of the condor_q -l command. This environment variable may be particularly useful in a USER_JOB_WRAPPER._CONDOR_MACHINE_AD
is the path to a file in the job's scratch directory which contains the machine ad for the slot the currently running job is using. The machine ad is current as of the start of the job, but is not updated during the running of the job. The format is the same as the output of the condor_status -l command._CONDOR_JOB_IWD
is the path to the initial working directory the job was born with._CONDOR_WRAPPER_ERROR_FILE
is only set when the administrator has installed aUSER_JOB_WRAPPER
. If this file exists, HTCondor assumes that the job wrapper has failed and copies the contents of the file to the StarterLog for the administrator to debug the problem.CONDOR_IDS
overrides the value of configuration variableCONDOR_IDS
, when set in the environment.CONDOR_ID
is set for scheduler universe jobs to be the same as theClusterId
attributeThe text was updated successfully, but these errors were encountered: