From 6c9cfd64be1243ea47c0494d918a795b76e1da6a Mon Sep 17 00:00:00 2001 From: <> Date: Tue, 8 Oct 2024 08:20:57 +0000 Subject: [PATCH] Deployed 5151ced0 with MkDocs version: 1.6.1 --- search/search_index.json | 2 +- sitemap.xml.gz | Bin 127 -> 127 bytes user-guide/gpu/index.html | 20 ++++++++++---------- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/search/search_index.json b/search/search_index.json index 35fd57863..7cb243286 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"ARCHER2 User Documentation","text":"

ARCHER2 is the next generation UK National Supercomputing Service. You can find more information on the service and the research it supports on the ARCHER2 website.

The ARCHER2 Service is a world class advanced computing resource for UK researchers. ARCHER2 is provided by UKRI, EPCC, Cray (an HPE company) and the University of Edinburgh.

"},{"location":"#what-the-documentation-covers","title":"What the documentation covers","text":"

This is the documentation for the ARCHER2 service and includes:

"},{"location":"#contributing-to-the-documentation","title":"Contributing to the documentation","text":"

The source for this documentation is publicly available in the ARCHER2 documentation Github repository so that anyone can contribute to improve the documentation for the service. Contributions can be in the form of improvements or addtions to the content and/or addtion of Issues providing suggestions for how it can be improved.

Full details of how to contribute can be found in the README.md file of the repository.

"},{"location":"#credits","title":"Credits","text":"

This documentation draws on the Cirrus Tier-2 HPC Documentation, Sheffield Iceberg Documentation and the ARCHER National Supercomputing Service Documentation.

"},{"location":"archer-migration/","title":"ARCHER to ARCHER2 migration","text":"

This section of the documentation is a guide for user migrating from ARCHER to ARCHER2.

It covers:

Tip

If you need help or have questions on ARCHER to ARCHER2 migration, please contact the ARCHER2 service desk

"},{"location":"archer-migration/account-migration/","title":"Migrating your account from ARCHER to ARCHER2","text":"

This section covers the following questions:

Tip

If you need help or have questions on ARCHER to ARCHER2 migration, please contact the ARCHER2 service desk

"},{"location":"archer-migration/account-migration/#when-will-i-be-able-to-access-archer2","title":"When will I be able to access ARCHER2?","text":"

We anticipate that users will have access during the week beginning 11th January 2021. Notification of activation of ARCHER2 projects will be sent to the project leaders/PIs and the project users.

"},{"location":"archer-migration/account-migration/#has-my-project-been-migrated-to-archer2","title":"Has my project been migrated to ARCHER2?","text":"

If you have an active ARCHER allocation at the end of the ARCHER service then your project will very likely be migrated to ARCHER2. If your project is migrated to ARCHER2 then it will have the same project code as it had on ARCHER.

Some further information that may be useful:

"},{"location":"archer-migration/account-migration/#how-much-resource-will-my-project-have-on-archer2","title":"How much resource will my project have on ARCHER2?","text":"

The unit of allocation on ARCHER2 is called the ARCHER2 Compute Unit (CU) and, in general, 1 CU will be worth 1 ARCHER2 node hour.

UKRI have determined the conversion rates which will be used to transfer existing ARCHER allocations onto ARCHER2. These will be:

In identifying these conversion rates UKRI has endeavoured to ensure that no user will be disadvantaged by the transfer of their allocation from ARCHER to ARCHER2.

A nominal allocation will be provided to all projects during the initial no-charging period. Users will be notified before the no-charging period ends.

When the ARCHER service ends, any unused ARCHER allocation in kAUs will be converted to ARCHER2 CUs and transferred to ARCHER2 project allocation.

"},{"location":"archer-migration/account-migration/#how-do-i-set-up-an-archer2-account","title":"How do I set up an ARCHER2 account?","text":"

Once you have been notified that you can go ahead and setup an ARCHER2 account you will do this through SAFE. Note that you should use the new unified SAFE interface rather than the ARCHER SAFE. The correct URL for the new SAFE is:

Your access details for this SAFE are the same as those for the ARCHER SAFE. You should log in in exactly the same way as you did on the ARCHER SAFE.

Important

You should make sure you request the same account name in your project on ARCHER2 as you have on ARCHER. This is to ensure that you have seamless access to your ARCHER /home data on ARCHER2. See the ARCHER to ARCHER2 Data Migration page for details on data transfer from ARCHER to ARCHER2

Once you have logged into SAFE, you will need to complete the following steps before you can log into ARCHER2 for the first time:

  1. Request an ARCHER2 account through SAFE
    1. See: How to request a machine account (SAFE documentation)
  2. (Optional) Create a new SSH key pair and add it to your ARCHER2 account in SAFE
    1. See: SSH key pairs (ARCHER2 documentation)
    2. If you do not add a new SSH key to your ARCHER2 account, then your account will use the same key as your ARCHER account
  3. Collect your initial, one-shot password from SAFE
    1. See: Intial passwords (ARCHER2 documentation)
"},{"location":"archer-migration/account-migration/#how-do-i-log-into-archer2-for-the-first-time","title":"How do I log into ARCHER2 for the first time?","text":"

The ARCHER2 documentation covers logging in to ARCHER from a variety of operating systems:

"},{"location":"archer-migration/archer2-differences/","title":"Main differences between ARCHER and ARCHER2","text":"

This section provides an overview of the main differences between ARCHER and ARCHER2 along with links to more information where appropriate.

"},{"location":"archer-migration/archer2-differences/#for-all-users","title":"For all users","text":""},{"location":"archer-migration/archer2-differences/#for-users-compiling-and-developing-software-on-archer2","title":"For users compiling and developing software on ARCHER2","text":""},{"location":"archer-migration/data-migration/","title":"Data migration from ARCHER to ARCHER2","text":"

This short guide explains how to move data from the ARCHER service to the ARCHER2 service.

We have also created a walkthrough video to guide you.

Note

This section assumes that you have an active ARCHER and ARCHER2 account, and that you have successfully logged in to both accounts.

Tip

Unlike normal access, ARCHER to ARCHER2 transfer has been set up to require only one form of authentication. You will not need to generate a new SSH key pair to transfer data from ARCHER to ARCHER2 as your password will suffice.

First, login to the ARCHER(1) (making sure to change auser to your username):

ssh auser@login.archer.ac.uk\n

Then, combine important research data into a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt directory1/\n

Please be selective -- the more data you want to transfer, the more time it will take.

From ARCHER in particular, in order to get the best transfer performance, we need to access a newer version of the SSH program. We do this by loading the openssh module:

module load openssh\n
"},{"location":"archer-migration/data-migration/#transferring-data-using-rsync-recommended","title":"Transferring data using rsync (recommended)","text":"

Begin the data transfer from ARCHER to ARCHER2 using rsync:

rsync -Pv -e\"ssh -c aes128-gcm@openssh.com\" \\\n       ./all_my_files.tar.gz a2user@transfer.dyn.archer2.ac.uk:/work/t01/t01/a2user\n

Important

Notice that the hostname for data transfer from ARCHER to ARCHER2 is not the usual login address. Instead, you use transfer.dyn.archer2.ac.uk. This address has been configured to allow higher performance data transfer and to allow access to ARCHER with password only with no SSH key required.

When running this command, you will be prompted to enter your ARCHER2 password. Enter it and the data transfer will begin. Also, remember to replace a2user with your ARCHER2 username, and t01 with the budget associated with that username.

The use of the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The -e flag allows specification of the ssh command - we have used this to add the location of the identity file. The -c option specifies the cipher to be used as aes128-gcm which has been found to increase performance. Unfortunately the ~ shortcut is not correctly expanded, so we have specified the full path. We move our research archive to our project work directory on ARCHER2.

"},{"location":"archer-migration/data-migration/#transferring-data-using-scp","title":"Transferring data using scp","text":"

If you are unconcerned about being able to restart an interrupted transfer, you could instead use the scp command,

scp -c aes128-gcm@openssh.com all_my_files.tar.gz \\\n    a2user@transfer.dyn.archer2.ac.uk:/work/t01/t01/a2user/\n

but rsync is recommended for larger transfers.

Important

Notice that the hostname for data transfer from ARCHER to ARCHER2 is not the usual login address. Instead, you use transfer.dyn.archer2.ac.uk. This address has been configured to allow higher performance data transfer and to allow access to ARCHER with password only with no SSH key required.

"},{"location":"archer2-migration/","title":"ARCHER2 4-cabinet system to ARCHER2 full system migration","text":"

This section of the documentation is a guide for user migrating from the ARCHER2 4-cabinet system to the ARCHER2 full system.

It covers:

Tip

If you need help or have questions on ARCHER2 4-cab to full ARCHER2 migration please contact the ARCHER2 service desk

"},{"location":"archer2-migration/account-migration/","title":"Accessing the ARCHER2 full system","text":"

This section covers the following questions:

Tip

If you need help or have questions on using ARCHER2 4-cabinet system and ARCHER2 full system please contact the ARCHER2 service desk

"},{"location":"archer2-migration/account-migration/#when-will-i-be-able-to-access-archer2-full-system","title":"When will I be able to access ARCHER2 full system?","text":"

We anticipate that users will have access from mid-late November. Users will have access to both the ARCHER2 4-cabinet system and ARCHER2 full system for at least 30 days. UKRI will confirm the dates and these will be communicated to you as they are confirmed. There will be at least 14 days notice before access to the ARCHER2 4-Cabinet system is removed.

"},{"location":"archer2-migration/account-migration/#has-my-project-been-enabled-on-archer2-full-system","title":"Has my project been enabled on ARCHER2 full system?","text":"

If you have an active ARCHER2 4-cabinet system allocation on 1st October 2021 then your project will be enabled on the ARCHER2 full system. The project code is the same on the full service as it is on ARCHER2 4-cabinet system.

Some further information that may be useful:

"},{"location":"archer2-migration/account-migration/#how-much-resource-will-my-project-have-on-archer2-full-system","title":"How much resource will my project have on ARCHER2 full system?","text":"

The unit of allocation on ARCHER2 is called the ARCHER2 Compute Unit (CU) and 1 CU is equivalent to 1 ARCHER2 node hour. Your time budget will be shared on both systems. This means that any existing allocation available to your project on the 4-cabinet system will also be available on the full system.

There will be a period of at least 30 days where users will have access to both the 4-cabinet system and the full system. During this time, use on the full system will be uncharged (though users must still have access to a valid, positive budget to be able to submit jobs) and use on the 4-cabinet system will be a charged in the usual way. Users will be notified before the no-charging period ends.

"},{"location":"archer2-migration/account-migration/#how-do-i-set-up-an-account-on-the-full-system","title":"How do I set up an account on the full system?","text":"

You will keep the same usernames, passwords and SSH keys that you use on the 4-cabinet system on the full system.

You do not need to do anything to enable your account, these will be made available automatically once access to the full system is available.

You will connect to the full system in the same way as you connect to the 4-cabinet system except for switching the ordering of the credentials:

"},{"location":"archer2-migration/account-migration/#how-do-i-log-into-the-different-archer2-systems","title":"How do I log into the different ARCHER2 systems?","text":"

The ARCHER2 documentation covers logging in to ARCHER2 from a variety of operating systems: - Logging in to ARCHER2 from macOS/Linux - Logging in to ARCHER2 from Windows

Login addresses:

Tip

When logging into the ARCHER2 full system for the first time, you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

"},{"location":"archer2-migration/account-migration/#what-will-happen-to-archer2-data","title":"What will happen to ARCHER2 data?","text":"

There are three file systems associated with the ARCHER2 Service:

"},{"location":"archer2-migration/account-migration/#home-file-systems","title":"home file systems","text":"

The home file systems will be mounted on both the 4-cabinet system and the full system; so users\u2019 directories are shared across the two systems. Users will be able to access the home file systems from both systems and no action is required to move data. The home file systems will be read and writeable on both services during the transition period.

"},{"location":"archer2-migration/account-migration/#work-file-systems","title":"work file systems","text":"

There are different work file systems for the 4-cabinet system and the full system.

The work file system on the 4-cabinet system (labelled \u201carcher2-4c-work\u201d in SAFE) will remain available on the 4-cabinet system during the transition period.

There will be new work file systems on the full system and you will have new directories on the new work file systems. Your initial quotas will typically be double your quotas for the 4-cabinet work file system.

Important: you are responsible for transferring any required data from the 4-cabinet work file systems to your new directories on the work file systems on the full system.

The work file system on the 4-cabinet system will be available for you to transfer your data from for at least 30 days from the start of the ARCHER2 full system access and 14 days notice will be given before the 4-cabinet work file system is removed.

"},{"location":"archer2-migration/account-migration/#rdfaas-file-systems","title":"RDFaaS file systems","text":"

For users who have access to the RDFaaS, your RDFaaS data will be available on both the 4-cabinet system and the full system during the transition period and will be readable and writeable on both systems.

"},{"location":"archer2-migration/archer2-differences/","title":"Main differences between ARCHER2 4-cabinet system and ARCHER2 full system","text":"

This section provides an overview of the main differences between the ARCHER2 4-cabinet system that all users have been using up until now and the full ARCHER2 system along with links to more information where appropriate.

"},{"location":"archer2-migration/archer2-differences/#for-all-users","title":"For all users","text":""},{"location":"archer2-migration/archer2-differences/#for-users-compiling-and-developing-software-on-archer2","title":"For users compiling and developing software on ARCHER2","text":""},{"location":"archer2-migration/data-migration/","title":"Data migration from the ARCHER2 4-cabinet system to the ARCHER2 full system","text":"

This short guide explains how to move data from from the work file system on the ARCHER2 4-cabinet system to the ARCHER2 full system. Your space on the home file system is shared between the ARCHER2 4-cabinet system and the ARCHER2 full system so everything from your home directory is already effectively transferred.

Note

This section assumes that you have an active ARCHER2 4-cabinet system and ARCHER2 full system account, and that you have successfully logged in to both accounts.

Tip

Unlike normal access, ARCHER2 4-cabinet system to ARCHER2 full system transfer has been set up to require only one form of authentication. You will only need one factor to authenticate from the 4-cab to the full system or vice versa. This factor can be either an SSH key (that has been registered against your account in SAFE) or you can use your passowrd. If you have a large amount of data to transfer you may want to setup a passphrase-less SSH key on ARCHER2 full system and use the data analysis nodes to run transfers via a Slurm job.

"},{"location":"archer2-migration/data-migration/#transferring-data-interactively-from-the-4-cabinet-system-to-the-full-system","title":"Transferring data interactively from the 4-cabinet system to the full system","text":"

First, login to the ARCHER2 4-cabinet system (making sure to change auser to your username):

ssh auser@login-4c.archer2.ac.uk\n

Then, combine important research data into a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt directory1/\n

Please be selective -- the more data you want to transfer, the more time it will take.

Unpack the archive file in the destination directory

tar -xzf all_my_files.tar.gz\n
"},{"location":"archer2-migration/data-migration/#transferring-data-using-rsync-recommended","title":"Transferring data using rsync (recommended)","text":"

Begin the data transfer from the ARCHER2 4-cabinet system to the ARCHER2 full system using rsync:

rsync -Pv all_my_files.tar.gz a2user@login.archer2.ac.uk:/work/t01/t01/a2user\n

When running this command, you will be prompted to enter your ARCHER2 password -- this is the same password for the ARCHER2 4-cabinet system and the ARCHER2 full system. Enter it and the data transfer will begin. Remember to replace a2user with your ARCHER2 username, and t01 with the budget associated with that username.

We use the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. We move our research archive to our project work directory on the ARCHER2 full system.

"},{"location":"archer2-migration/data-migration/#transferring-data-using-scp","title":"Transferring data using scp","text":"

If you are unconcerned about being able to restart an interrupted transfer, you could instead use the scp command,

scp all_my_files.tar.gz a2user@login.archer2.ac.uk:/work/t01/t01/a2user/\n

but rsync is recommended for larger transfers.

"},{"location":"archer2-migration/data-migration/#transferring-data-via-the-serial-queue","title":"Transferring data via the serial queue","text":"

It may be convenient to submit long data transfers to the serial queue. In this case, a number of simple preparatory steps are required to authenticate:

  1. On the full system, create a new ssh key pair without passphrase (just press return when prompted).
  2. Add the new public key to SAFE against your machine account.
  3. Use this key pair for ssh/scp commands in the serial queue to authenticate. As it has been arranged that only one of ssh key/password are required between the serial nodes and the 4-cabinet system, this is sufficient.

An example serial queue script using rsync might be:

#!/bin/bash\n\n# Slurm job options (job-name, job time)\n\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n#SBATCH --time=02:00:00\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code\n\n#SBATCH --account=[budget code] \n\n# Issue appropriate rsync command\n\nrsync -av --stats --progress --rsh=\"ssh -i ${HOME}/.ssh/id_rsa_batch\" \\\n      user-01@login-4c.archer2.ac.uk:/work/proj01/proj01/user-01/src \\\n      /work/proj01/proj01/user-01/destination\n
where ${HOME}/.ssh/id_rsa_batch is the new ssh key. Note that the ${HOME} directory is visible from the serial nodes on the full system, so ssh key pairs in ${HOME}/.ssh are available.

"},{"location":"archer2-migration/porting/","title":"Porting applications to full ARCHER2 system","text":"

Porting applications to the full ARCHER2 system has generally proven straightforward if they are running successfully on the ARCHER2 4-cabinet system. You should be able to use the same (or very similar) compile processes on the the full system as you used on ARCHER2.

During testing of the ARCHER2 full system, the CSE team at EPCC have seen that application binaries compiled on the 4-cabinet system can usually be copied over to the full system and work well and give good performance. However, if you run into issues with executables taken from the 4-cabinet system on the full system you should recompile in the first instance.

Information on compiling applications on the full system can be found in the Application Development Environment section of the User and Best Practice Guide.

"},{"location":"data-tools/","title":"Data Analysis and Tools","text":"

This section provides information on each of the centrally installed data analysis software and other software tools.

The tools currently available in this section are (software that is installed or maintained by third-parties rather than the ARCHER2 service are marked with *):

"},{"location":"data-tools/amd-uprof/","title":"AMD \u03bcProf","text":"

AMD \u03bcProf (\u201cMICRO-prof\u201d) is a software profiling analysis tool for x86 applications running on Windows, Linux and FreeBSD operating systems and provides event information unique to the AMD \u201cZen\u201d-based processors and AMD INSTINCT\u2122 MI Series accelerators. AMD uProf enables the developer to better understand the limiters of application performance and evaluate improvements.

"},{"location":"data-tools/amd-uprof/#accessing-amd-prof-on-archer2","title":"Accessing AMD \u03bcProf on ARCHER2","text":"

To gain access to the AMD\u03bcProf tools on ARCHER2, you must load the module:

module load amd-uprof\n
"},{"location":"data-tools/amd-uprof/#using-amd-prof","title":"Using AMD \u03bcProf","text":"

Please see the AMD documentation for information on how to use \u03bcProf:

"},{"location":"data-tools/cray-r/","title":"R","text":""},{"location":"data-tools/cray-r/#r-for-statistical-computing","title":"R for statistical computing","text":"

R is a software environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time-series analysis, classification, clustering, and so on).

Note

When you log onto ARCHER2, no R module is loaded by default. You need to load the cray-R module to access the functionality described below.

The recommended version of R to use on ARCHER2 is the HPE Cray R distribution, which can be loaded using:

module load cray-R\n

The HPE Cray R distribution includes a range of common R packages, including all of the base packages, plus a few others.

To see what packages are available, run the R command

library()\n

--from the R command prompt.

At the time of writing, the HPE Cray R distribution included the following packages:

Full System
Packages in library \u2018/opt/R/4.0.3.0/lib64/R/library\u2019:\n\nbase                    The R Base Package\nboot                    Bootstrap Functions (Originally by Angelo Canty\n                        for S)\nclass                   Functions for Classification\ncluster                 \"Finding Groups in Data\": Cluster Analysis\n                        Extended Rousseeuw et al.\ncodetools               Code Analysis Tools for R\ncompiler                The R Compiler Package\ndatasets                The R Datasets Package\nforeign                 Read Data Stored by 'Minitab', 'S', 'SAS',\n                        'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...\ngraphics                The R Graphics Package\ngrDevices               The R Graphics Devices and Support for Colours\n                        and Fonts\ngrid                    The Grid Graphics Package\nKernSmooth              Functions for Kernel Smoothing Supporting Wand\n                        & Jones (1995)\nlattice                 Trellis Graphics for R\nMASS                    Support Functions and Datasets for Venables and\n                        Ripley's MASS\nMatrix                  Sparse and Dense Matrix Classes and Methods\nmethods                 Formal Methods and Classes\nmgcv                    Mixed GAM Computation Vehicle with Automatic\n                        Smoothness Estimation\nnlme                    Linear and Nonlinear Mixed Effects Models\nnnet                    Feed-Forward Neural Networks and Multinomial\n                        Log-Linear Models\nparallel                Support for Parallel computation in R\nrpart                   Recursive Partitioning and Regression Trees\nspatial                 Functions for Kriging and Point Pattern\n                        Analysis\nsplines                 Regression Spline Functions and Classes\nstats                   The R Stats Package\nstats4                  Statistical Functions using S4 Classes\nsurvival                Survival Analysis\ntcltk                   Tcl/Tk Interface\ntools                   Tools for Package Development\nutils                   The R Utils Package\n
4-cabinet system
Packages in library \u2018/opt/R/4.0.2.0/lib64/R/library\u2019:\n\nbase                    The R Base Package\nboot                    Bootstrap Functions (Originally by Angelo Canty\n                        for S)\nclass                   Functions for Classification\ncluster                 \"Finding Groups in Data\": Cluster Analysis\n                        Extended Rousseeuw et al.\ncodetools               Code Analysis Tools for R\ncompiler                The R Compiler Package\ndatasets                The R Datasets Package\nforeign                 Read Data Stored by 'Minitab', 'S', 'SAS',\n                        'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...\ngraphics                The R Graphics Package\ngrDevices               The R Graphics Devices and Support for Colours\n                        and Fonts\ngrid                    The Grid Graphics Package\nKernSmooth              Functions for Kernel Smoothing Supporting Wand\n                        & Jones (1995)\nlattice                 Trellis Graphics for R\nMASS                    Support Functions and Datasets for Venables and\n                        Ripley's MASS\nMatrix                  Sparse and Dense Matrix Classes and Methods\nmethods                 Formal Methods and Classes\nmgcv                    Mixed GAM Computation Vehicle with Automatic\n                        Smoothness Estimation\nnlme                    Linear and Nonlinear Mixed Effects Models\nnnet                    Feed-Forward Neural Networks and Multinomial\n                        Log-Linear Models\nparallel                Support for Parallel computation in R\nrpart                   Recursive Partitioning and Regression Trees\nspatial                 Functions for Kriging and Point Pattern\n                        Analysis\nsplines                 Regression Spline Functions and Classes\nstats                   The R Stats Package\nstats4                  Statistical Functions using S4 Classes\nsurvival                Survival Analysis\ntcltk                   Tcl/Tk Interface\ntools                   Tools for Package Development\nutils                   The R Utils Package\n
"},{"location":"data-tools/cray-r/#running-r-on-the-compute-nodes","title":"Running R on the compute nodes","text":"

In this section, we provide an example R job submission scripts for using R on the ARCHER2 compute nodes.

"},{"location":"data-tools/cray-r/#serial-r-submission-script","title":"Serial R submission script","text":"
#!/bin/bash --login\n\n#SBATCH --job-name=r_test\n#SBATCH --ntasks=1\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g., t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Load the R module\nmodule load cray-R\n\n# Run your R progamme\nRscript serial_test.R\n

On completion, the output of the R script will be available in the job output file.

"},{"location":"data-tools/darshan/","title":"Darshan","text":"

Darshan is a scalable HPC I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with minimum overhead. The name is taken from a Sanskrit word for \"sight\" or \"vision\".

Darshan is developed at the Argonne Leadership Computing Facility (ALCF)

Useful links:

"},{"location":"data-tools/darshan/#using-darshan-on-archer2","title":"Using Darshan on ARCHER2","text":"

Using Darshan generally consists of two stages:

  1. Collect IO profile data using the Darshan runtime
  2. Analysing Darshan log files using Darshan utility software
"},{"location":"data-tools/darshan/#collecting-io-profile-data","title":"Collecting IO profile data","text":"

To collect IO profile data you add the command:

module load darshan\n

to your job submission script as the last module command before you run your program. As Darshan does not distinguish between different software run in your job submission script, we typically recommand that you use a structure like:

module load darshan\nsrun ...usual software launch options...\nmodule remove darshan\n

This will avoid Darshan profiling IO for operations that are not part of your main parallel program.

Tip

There may be some periods when Darshan monitoring is enabled by default for all users. During these periods, you can disable Darshan monitoring by adding the command module remove darshan to your job submission script. Periods of Darshan monitoring will be noted on the ARCHER2 Service Status page.

Important

The darshan module is dependent on the compiler environment you are using and you should ensure that you load the darshan module that matches the compiler environment you used to compile the program you are analysing. For example, if your software was compiled using PrgEnv-gnu, then you would need to activate the GCC compiler environment before loading the darshan module to ensure you get the GCC version of Darshan. This means loading the correct PrgEnv- module before you load the darshan module:

module load PrgEnv-gnu\nmodule load darshan\nsrun ...usual software launch options...\nmodule remove darshan\n
"},{"location":"data-tools/darshan/#location-of-darshan-profile-logs","title":"Location of Darshan profile logs","text":"

Darshan writes all profile logs to a shared location on the ARCHER2 NVMe Lustre file system. You can find your profile logs at:

/mnt/lustre/a2fs-nvme/system/darshan/YYYY/MM/DD\n

where YYYY/MM/DD correspond to the date on which your job ran.

"},{"location":"data-tools/darshan/#analysing-darshan-profile-logs","title":"Analysing Darshan profile logs","text":"

The simplest way to analyse the profile log files is to use the darshan-parser utility on the ARCHER2 login nodes. You make the Darshan analysis utilities available with the command:

module load darshan-util\n

Once this is loaded, you can produce and IO performance summary from a profile log file with:

darshan-parser --perf /path/to/darshan/log/file.darshan\n

You can get a dump of all data in the Darshan profile log by omitting the --perf option, e.g.:

darshan-parser /path/to/darshan/log/file.darshan\n

Tip

The darshan-job-summary.pl and darshan-summary-per-file.sh utilities do not work on ARCHER2 as the required graphical packages are not currently available.

Documentation on the Darshan analysis utilities are available at:

"},{"location":"data-tools/forge/","title":"Linaro Forge","text":""},{"location":"data-tools/forge/#linaro-forge","title":"Linaro Forge","text":"

Linaro Forge provides debugging and profiling tools for MPI parallel applications, and OpenMP or pthreads multi-threaded applications (and also hydrid MPI/OpenMP). Forge DDT is the debugger and MAP is the profiler.

"},{"location":"data-tools/forge/#user-interface","title":"User interface","text":"

There are two ways of running the Forge user interface. If you have a good internet connection to ARCHER2, the GUI can be run on the front-end (with an X-connection). Alternatively, one can download a copy of the Forge remote client to your laptop or desktop, and run it locally. The remote client should be used if at all possible.

To download the remote client, see the Forge download pages. Version 24.0 is known to work at the time of writing. A section further down this page explains how to use the remote client, see Connecting with the remote client.

"},{"location":"data-tools/forge/#licensing","title":"Licensing","text":"

ARCHER2 has a licence for up to 2080 tokens, where a token represents an MPI parallel process. Running Forge DDT/MAP to debug/profile a code running across 16 nodes using 128 MPI ranks per node would require 2048 tokens. If you wish to run on more nodes, say 32, then it will be necessary to reduce the number of tasks per node so as to fall below the maximum number of tokens allowed.

Please note, Forge licence tokens are shared by all ARCHER2 (and Cirrus) users.

To see how many tokens are in use, you can view the licence server status page by first setting up an SSH tunnel to the node hosting the licence server.

ssh <username>@login.archer2.ac.uk -L 4241:dvn04:4241\n

You can now view the status page from within a local browser, see http://localhost:4241/status.html.

Note

The licence status page may contain multiple licences, indicated by a row of buttons (one per licence) near the top of the page. The details of the 12-month licence described above can be accessed by clicking on the first button in the row. Additional buttons may appear at various times for boosted licences: once a quarter, ARCHER2 will have a boosted 7-day licence offering 8192 tokens, sufficient for 64 nodes running 128 MPI ranks per node. Please contact the Service Desk if you have a specific requirement that exceeds the current Forge licence provision.

Note

The licence status page refers to the Arm Licence Server. Arm is the name of the company that originally developed Forge before it was acquired by Linaro.

"},{"location":"data-tools/forge/#one-time-set-up-for-using-forge","title":"One time set-up for using Forge","text":"

A preliminary step is required to set up the necessary Forge configuration files that allow DDT and MAP to initialise its environment correctly so that it can, for example, interact with the Slurm queue system. These steps should be performed in the /work file system on ARCHER2.

It is recommended that these commands are performed in the top-level work file system directory for the user account, i.e., ${HOME/home/work}.

module load forge\ncd ${HOME/home/work}\nsource ${FORGE_DIR}/config-init\n

Running the source command will create a directory ${HOME/home/work}/.forge that contains the following files.

system.config  user.config\n

Warning

The config-init script may output, Warning: failed to read system config. Please ignore as subsequent messages should indicate that the new configuration files have been created.

Within the system.config file you should find that shared directory is set to the equivalent of ${HOME/home/work/.forge}. That directory will also store other relevant files when Forge is run.

"},{"location":"data-tools/forge/#using-ddt","title":"Using DDT","text":"

DDT (Distributed Debugging Tool) provides an easy-to-use graphical interface for source-level debugging of compiled C/C++ or Fortran codes. It can be used for non-interactive debugging, and there is also some limited support for python debugging.

"},{"location":"data-tools/forge/#preparation","title":"Preparation","text":"

To prepare your program for debugging, compile and link in the normal way but remember to include the -g compiler option to retain symbolic information in the executable. For some programs, it may be necessary to reduce the optimisation to -O0 to obtain full and consistent information. However, this in itself can change the behaviour of bugs, so some experimentation may be necessary.

"},{"location":"data-tools/forge/#post-mortem-debugging","title":"Post-mortem debugging","text":"

A non-interactive method of debugging is available which allows information to be obtained on the state of the execution at the point of failure in a batch job.

Such a job can be submitted to the batch system in the usual way. The relevant command to start the executable is as follows.

# ... Slurm batch commands as usual ...\n\nmodule load forge\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nddt --verbose --offline --mpi=slurm --np 8 \\\n    --mem-debug=fast --check-bounds=before \\\n    ./my_executable\n

The parallel launch is delegated to ddt and the --mpi=slurm option indicates to ddt that the relevant queue system is Slurm (there is no explicit srun). It will also be necessary to state explicitly to ddt the number of processes required (here --np 8). For other options see, e.g., ddt --help.

Note that higher levels of memory debugging can result in extremely slow execution. The example given above uses the default --mem-debug=fast which should be a reasonable first choice.

Execution will produce a .html format report which can be used to examine the state of execution at the point of failure.

"},{"location":"data-tools/forge/#interactive-debugging-using-the-client-to-submit-a-batch-job","title":"Interactive debugging: using the client to submit a batch job","text":"

You can also start the client interactively (for details of remote launch, see Connecting with the remote client).

module load forge\nddt\n

This should start a window as shown below. Click on the DDT panel on the left, and then on the Run and debug a program option. This will bring up the Run dialogue as shown.

Note:

In the Application sub panel of the Run dialog box, details of the executable, command line arguments or data files, the working directory and so on should be entered.

Click the MPI checkbox and specify the MPI implementation. This is done by clicking the Details button and then the Change button. Choose the SLURM (generic) implementation from the drop-down menu and click OK. You can then specify the required number of nodes/processes and so on.

Click the OpenMP checkbox and select the relevant number of threads (if there is no OpenMP in the application itself, select 1 thread).

Click the Submit to Queue checkbox and then the associated Configure button. A new set of options will appear such as Submission template file, where you can enter ${FORGE_DIR}/templates/archer2.qtf and click OK. This template file provides many of the options required for a standard batch job. You will then need to click on the Queue Parameters button in the same section and specify the relevant project budget, see the Account entry.

The default queue template file configuration uses the short QoS with the standard time limit of 20 minutes. If something different is required, one can edit the settings. Alternatively, one can copy the archer2.qtf file (to ${HOME/home/work}/.forge) and make the relevant changes. This new template file can then be specified in the dialog window.

There may be a short delay while the sbatch job starts. Debugging should then proceed as described in the Linaro Forge documentation.

"},{"location":"data-tools/forge/#using-map","title":"Using MAP","text":"

Load the forge module:

module load forge\n
"},{"location":"data-tools/forge/#linking","title":"Linking","text":"

MAP uses two small libraries to collect data from your program. These are called map-sampler and map-sampler-pmpi. On ARCHER2, the linking of these libraries is usually done automatically via the LD_PRELOAD mechanism, but only if your program is dynamically linked. Otherwise, you will need to link the MAP libraries manually by providing explicit link options.

The library paths specified in the link options will depend on the programming environment you are using as well as the Cray programming release. Here are the paths for each of the compiler environments consistent with the Cray Programming Release (CPE) 22.12 using the default OFI as the low-level comms protocol:

For example, for PrgEnv-gnu the additional options required at link time are given below.

-L${FORGE_DIR}/map/libs/default/gnu/ofi \\\n-lmap-sampler-pmpi -lmap-sampler \\\n-Wl,--eh-frame-hdr -Wl,-rpath=${FORGE_DIR}/map/libs/default/gnu/ofi\n

The MAP libraries for other Cray programming releases can be found under ${FORGE_DIR}/map/libs. If you require MAP libraries built for the UCX comms protocol, simply replace ofi with ucx in the library path.

"},{"location":"data-tools/forge/#generating-a-profile","title":"Generating a profile","text":"

Submit a batch job in the usual way, and include the lines:

# ... Slurm batch commands as usual ...\n\nmodule load forge\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nmap -n <number of MPI processes> --mpi=slurm --mpiargs=\"--hint=nomultithread --distribution=block:block\" --profile ./my_executable\n

Successful execution will generate a file with a .map extension.

This .map file may be viewed via the GUI (start with either map or forge) by selecting the Load a profile data file from a previous run option. The resulting file selection dialog box can then be used to locate the .map file.

"},{"location":"data-tools/forge/#connecting-with-the-remote-client","title":"Connecting with the remote client","text":"

If one starts the Forge client on e.g., a laptop, one should see the main window as shown above. Select Remote Launch and then Configure from the drop-down menu. In the Configure Remote Connections dialog box click Add. The following window should be displayed. Fill in the fields as shown. The Connection Name is just a tag for convenience (useful if a number of different accounts are in use). The Host Name should be as shown with the appropriate username. The Remote Installation Directory should be exactly as shown. The Remote Script is needed to execute additional environment commands on connection. A default script is provided in the location shown.

/work/y07/shared/utils/core/forge/latest/remote-init\n

Other settings can be as shown. Remember to click OK when done.

From the Remote Launch menu you should now see the new Connection Name. Select this, and enter the relevant ssh passphase and machine password to connect. A remote connection will allow you to debug, or view a profile, as discussed above.

If different commands are required on connection, a copy of the remote-init script can be placed in, e.g., ${HOME/home/work}/.forge and edited as necessary. The full path of the new script should then be specified in the remote launch settings dialog box. Note that the script changes the directory to the /work/ file system so that batch submissions via sbatch will not be rejected.

Finally, note that ssh may need to be configured so that it picks up the correct local public key file. This may be done, e.g., via the local .ssh/config configuration file.

"},{"location":"data-tools/forge/#useful-links","title":"Useful links","text":""},{"location":"data-tools/globus/","title":"Using Globus to transfer data to/from ARCHER2 filesystems","text":""},{"location":"data-tools/globus/#setting-up-archer2-filesystems","title":"Setting up ARCHER2 filesystems","text":"

Navigate to https://app.globus.org

Log in with your Globus identity (this could be a globusid.org or other identity)

In File Manager, use the search tool to search for \u201cArcher2 file systems\u201d. Select it.

In the transfer pane, you are told that Authentication/Consent is required. Click Continue.

Click on the ARCHER2 Safe (safe.epcc.ed.ac.uk) link

Select the correct User account (if you have more than one)

Click Accept

Now confirm your Globus credentials \u2013 click Continue

Click on the SAFE id you selected previously

Make sure the correct User account is selected and Accept again

Your ARCHER2 /home directory will be shown

You can switch to viewing e.g. your /work directory by editing the path, or using the \"up one folder\" and selecting folders to move down the tree, as required

"},{"location":"data-tools/globus/#setting-up-the-other-end-of-the-transfer","title":"Setting up the other end of the transfer","text":"

Make sure you select two-panel view mode

"},{"location":"data-tools/globus/#laptop","title":"Laptop","text":"

If you wish to transfer data to/from your personal laptop or other device, click on the Collection Search in the right-hand panel

Use the link to \u201cGet Globus Connect Personal\u201d to create a Collection for your local drive.

"},{"location":"data-tools/globus/#other-server-eg-jasmin","title":"Other server e.g. JASMIN","text":"

If you wish to connect to another server, you will need to search for the Collection e.g. JASMIN Default Collection and authenticate

Please see the JASMIN Globus page for more information

"},{"location":"data-tools/globus/#setting-up-and-initiating-the-transfer","title":"Setting up and initiating the transfer","text":"

Once you are connected to both the Source and Destination Collections, you can use the File Manager to select the files to be transferred, and then click the Start button to initiate the transfer

A pop-up will appear once the Transfer request has been submitted successfully

Clicking on the \u201cView Details\u201d will show the progress and final status of the transfer

"},{"location":"data-tools/globus/#using-a-different-archer2-account","title":"Using a different ARCHER2 account","text":"

If you want to use Globus with a different account on ARCHER2, you will have to go to Settings

Manage Identities

And Unlink the current ARCHER2 safe identity, then repeat the link process with the other ARCHER2 account

"},{"location":"data-tools/julia/","title":"Julia","text":"

Julia is a general purpose software used widely in datascience and for data visualisation.

Important

This documentation is provided by an external party (i.e. not by the ARCHER2 service itself). Julia is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"data-tools/julia/#first-time-installation","title":"First time installation","text":"

Note

There is no centrally installed version of Julia, so you will have to manually install it and any packages you may need. The following guide was tested on julia-1.6.6.

You will first need to download Julia into your work directory and untar the folder. You should then add the folder to your system path so you can use the julia executable. Finally, you need to tell Julia to install any packages in the work directory as opposed to the default home directory, which can only be accessed from the login nodes. This can be done with the following code

export WORK=/work/t01/t01/auser\ncd $WORK\n\nwget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.6-linux-x86_64.tar.gz\ntar zxvf julia-1.6.6-linux-x86_64.tar.gz\nrm ./julia-1.6.6-linux-x86_64.tar.gz\n\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"\n\nmkdir ./.julia\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport PATH=\"$PATH:$WORK/$JULIA_DEPOT_PATH/bin\"\n

At this point you should have a working installation of Julia! The environment variables will however be cleared when you log out of the terminal. You can set them in the .bashrc file so that they're automatically defined every time you log in by adding the following lines to the end of the file ~/.bashrc

export WORK=\"/work/t01/t01/auser\"\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"\nexport PATH=\"$PATH:$JULIA_DEPOT_PATH/bin\"\n
"},{"location":"data-tools/julia/#installing-packages-and-using-environments","title":"Installing packages and using environments","text":"

Julia has a built in package manager which can be used to install registered packages quickly and easily. Like with many other high level programming languages we can make use of environments to control dependencies etc.

To make an environment, first navigate to where you want your environment to be (ideally a subfolder of your /work/ directory) and create an empty folder to store the environment in. Then launch Julia with the --project flag.

cd $WORK\nmkdir ./MyTestEnv\njulia --project=$WORK/MyTestEnv\n

This launches Julia in the MyTestEnv environment. You can then install packages as usual using the normal commands in the Julia terminal. E.g.

using Pkg\nPkg.add(\"Oceananigans\")\n
"},{"location":"data-tools/julia/#configuring-mpijl","title":"Configuring MPI.jl","text":"

The MPI.jl package doesn't use the system MPICH implementation by default. You can set it up to do this by following the steps below. First you will need to load the cray-mpich module and define some environment variables (see here for further details). Then you can launch Julia in an environment of your choice, ready to build.

module load cray-mpich/8.1.23\nexport JULIA_MPI_BINARY=\"system\"\nexport JULIA_MPI_PATH=\"\"\nexport JULIA_MPI_LIBRARY=\"/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi.so\"\nexport JULIA_MPIEXEC=\"srun\"\n\njulia --project=<<path to environment>>\n

Once in the Julia terminal you can build the MPI.jl package using the following code. The final line installs the mpiexecjl command which should be used instead of srun to launch mpi processes.

using Pkg\nPkg.build(\"MPI\"; verbose=true)\nMPI.install_mpiexecjl(command = \"mpiexecjl\", force = false, verbose = true)\n
The mpiexecjl command will be installed in the directory that JULIA_DEPOT_PATH points too.

Note

You only need to do this once per environment.

"},{"location":"data-tools/julia/#running-julia-on-the-compute-nodes","title":"Running Julia on the compute nodes","text":"

Below is an example script for running Julia with mpi on the compute nodes

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=<<job-name>>\n#SBATCH --time=00:19:00\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=24\n#SBATCH --cpus-per-task=1\n\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n#SBATCH --account=<<your account>>\n#SBATCH --partition=standard\n\n# Setup the job environment (this module needs to be loaded before any other modules)\nmodule load PrgEnv-cray\nmodule load cray-mpich/8.1.23\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\nexport JULIA_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Define some paths\nexport WORK=/work/t01/t01/auser\n\nexport JULIA=\"$WORK/julia-1.6.6/bin/julia\"  # The julia executable\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"  # The folder of the julia executable\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport MPIEXECJL=\"$JULIA_DEPOT_PATH/bin/mpiexecjl\"  # The path to the mpiexexjl executable\n\n$MPIEXECJL --project=$WORK/MyTestEnv -n 24 $JULIA ./MyMpiJuliaScript.jl\n

The above script uses MPI but you can also use multithreading instead by setitng the JULIA_NUM_THREADS environment variable.

"},{"location":"data-tools/papi-mpi-lib/","title":"PAPI MPI Library","text":"

The Performance Application Programming Interface (PAPI) is an API that facilitates the reading of performance counter data without needing to know the details of the underlying hardware.

For convenience, we have developed an MPI-based wrapper for PAPI, called papi_mpi_lib, which can be found via the link below.

https://github.com/cresta-eu/papi_mpi_lib

The PAPI MPI Library makes it possible to monitor a user-defined set of hardware performance counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just four functions, and is intended to be straightforward to use. Once you've decided where in your code you wish to record counter values, you can control which counters are read at runtime by setting the PAT_RT_PERFCTR environment variable in the job submission script. As your code executes, the specified counters will be read at various points. After each reading, the counter values are summed by rank 0 (via an MPI reduction) before being output to a log file.

You can discover which counters are available on ARCHER2 compute nodes by submitting the following single node job.

#!/bin/bash --login\n\n#SBATCH -J papi\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=1\n#SBATCH --account=<budget code>\n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --export=none\n\nfunction papi_query() {\n  export LD_LIBRARY_PATH=/opt/cray/pe/papi/$2/lib64:/opt/cray/libfabric/$3/lib64\n  module -q restore\n\n  module -q load cpe/$1\n  module -q load papi/$2\n\n  mkdir -p $1\n  papi_component_avail -d &> $1/papi_component_avail.txt\n  papi_native_avail -c &> $1/papi_native_avail.txt\n  papi_avail -c -d &> $1/papi_avail.txt\n}\n\npapi_query 22.12 6.0.0.17 1.12.1.2.2.0.0\n

The job runs various papi commands with the output being directed to specific text files. Please consult the text files to see which counters are available. Note, counters that are not available may still be listed in the file, but with a label such as <NA>.

As of July 2023, the Cray Programming Environment (CPE), PAPI and libfabric versions on ARCHER2, were 22.12, 6.0.0.17 and 1.12.1.2.2.0.0 respectively; these versions may change in the future.

Alternatively, you can run pat_help counters rome from a login node to check the availability of individual counters.

Further information on papi_mpi_lib along with test harnesses and example scripts can be found by reading the PAPI MPI Library readme file.

"},{"location":"data-tools/paraview/","title":"ParaView","text":"

ParaView is a data visualisation and analysis package. Whilst ARCHER2 compute or login nodes do not have graphics cards installed in them, ParaView is installed so the visualisation libraries and applications can be used to post-process simulation data. The ParaView server (pvserver), batch application (pvbatch), and the Python interface (pvpython) are all available. Users are able to run the server on the compute nodes and connect to a local ParaView client running on their own computer.

"},{"location":"data-tools/paraview/#useful-links","title":"Useful links","text":""},{"location":"data-tools/paraview/#using-paraview-on-archer2","title":"Using ParaView on ARCHER2","text":"

ParaView is available through the paraview module.

module load paraview\n

Once the module has been added, the ParaView executables, tools, and libraries will be available.

"},{"location":"data-tools/paraview/#connecting-to-pvserver-on-archer2","title":"Connecting to pvserver on ARCHER2","text":"

For doing visualisation, you should connect to pvserver from a local ParaView client running on your own computer.

Note

You should make sure the version of ParaView you have installed locally is the same as the one on ARCHER2 (version 5.10.1).

The following instructions are for running pvserver in an interactive job. Start an iteractive job using:

srun --nodes=1 --exclusive --time=00:20:00 \\\n               --partition=standard --qos=short --pty /bin/bash\n

Once the job starts the command prompt will change to show you are now on the compute node, e.g.:

auser@nid001023:/work/t01/t01/auser> \n

Then load the ParaView module and start pvserver with the srun command,

auser@nid001023:/work/t01/t01/auser> module load paraview\nauser@nid001023:/work/t01/t01/auser> srun --overlap --oversubscribe -n 4 \\\n> pvserver --mpi --force-offscreen-rendering\nWaiting for client...\nConnection URL: cs://nid001023:11111\nAccepting connection(s): nid001023:11111\n

Note

The previous example uses 4 compute cores to run pvserver. You can increase the number of cores in case the visualisation does not run smoothly. Please bear in mind that, depending on the testcase, a large number of compute cores can lead to an out-of-memory runtime error.

In a separate terminal you can now set up an SSH tunnel with the node ID and port number which the pvserver is using, e.g.:

ssh -L 11111:nid001023:11111 auser@login.archer2.ac.uk \n

enter your password and passphrase as usual.

You can then connect from your local client using the following connection settings:

Name:           archer2 \nServer Type:    Client/Server \nHost:           localhost \nPort:           11111\n

Note

The Host from the local client should be set to \"localhost\" when using the SSH tunnel. The \"Name\" field can be set to a name of your choosing. 11111 is the default port for pvserver.

If it has connected correctly, you should see the following:

Waiting for client...\nConnection URL: cs://nid001023:11111\nAccepting connection(s): nid001023:11111\nClient connected.\n
"},{"location":"data-tools/paraview/#using-batch-mode-pvbatch","title":"Using batch-mode (pvbatch)","text":"

A pvbatch script can be run in a standard job script. For example the following will run on a single node:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=example_paraview_job\n#SBATCH --time=0:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]             \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load paraview\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread pvbatch pvbatchscript.py\n
"},{"location":"data-tools/paraview/#compiling-paraview","title":"Compiling ParaView","text":"

The latest instructions for building ParaView on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"data-tools/pm-mpi-lib/","title":"Power Management MPI Library","text":"

The ARCHER2 compute nodes each have a set of so-called Power Management (PM) counters. These cover point-in-time power readings for the whole node, and for the CPU and memory domains. The accumulated energy use is also recorded at the same level of detail. Further, there are two temperature counters, one for each socket/processor on the node. The counters are read ten times per second and the data written to a set of files stored within node memory (located at /sys/cray/pm_counters/).

For convenience, we have developed an MPI-based wrapper, called pm_mpi_lib that facilitates the reading of the PM counter files, see the link below.

https://github.com/cresta-eu/pm_mpi_lib

The PM MPI Library makes it possible to monitor the Power Management counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just three functions, and is intended to be straightforward to use. You simply decide which parts of your code you wish to profile as regards energy usage and/or power consumption.

As your code executes, the PM counters will be read at various points by a single designated monitor rank on each node assigned to the job. These readings are then written to a log file, which, after the job completes, will contain one set of time-stamped readings per node for every call to the pm_mpi_record function made from within your code. The readings can then be aggregated according to preference.

Further information along with test harnesses and example scripts can be found by reading the PM MPI Library readme file.

"},{"location":"data-tools/spack/","title":"Spack","text":"

Spack is a package manager, a tool to assist with building and installing software as well as determining what dependencies are required and installing those. It was originally designed for use on HPC clusters, where several variations of a given package may be installed alongside one another for different use cases -- for example different versions, built with different compilers, using MPI or hybrid MPI+OpenMP. Spack is principally written in Python but has a component written in Answer Set Programming (ASP) which is used to determine the required dependencies for a given package installation.

Users are welcome to install Spack themselves in their own directories, but we are making an experimental installation tailored for ARCHER2 available centrally. This page provides documentation on how to activate and install packages using the central installation on ARCHER2. For more in-depth information on using Spack itself please see the developers' documentation.

Important

As ARCHER2's central Spack installation is still in an experimental stage please be aware that we cannot guarantee that it will work with full functionality and we may not be able to provide support.

"},{"location":"data-tools/spack/#activating-spack","title":"Activating Spack","text":"

As it is still in an experimental stage, the Spack module is not made available to users by default. You must firstly load the other-software module:

auser@ln01:~> module load other-software\n

Several modules with spack in their name will become visible to you. You should load the spack module:

auser@ln01:~> module load spack\n

This configures Spack to place its cache on and install software to a directory called .spack in your base work directory, e.g. at /work/t01/t01/auser/.spack.

At this point Spack is available to you via the spack command. You can get started with spack help, reading the Spack documentation, or by testing a package's installation.

"},{"location":"data-tools/spack/#using-spack-on-archer2","title":"Using Spack on ARCHER2","text":""},{"location":"data-tools/spack/#installing-software","title":"Installing software","text":"

At its simplest, Spack installs software with the spack install command:

auser@ln01:~> spack install gromacs\n

This very simple gromacs installation specification, or spec, would install GROMACS using the default options given by the Spack gromacs package. The spec can be expanded to include which options you like. For example, the command

auser@ln01:~> spack install gromacs@2024.2%gcc+mpi\n

would use the GCC compiler to install an MPI-enabled version of GROMACS version 2024.2.

Tip

Spack needs to bootstrap the installation of some extra software in order to function, principally clingo which is used to solve the dependencies required for an installation. The first time you ask Spack to concretise a spec into a precise set of requirements, it will take extra time as it downloads this software and extracts it into a local directory for Spack's use.

You can find information about any Spack package and the options available to use with the spack info command:

auser@ln01:~> spack info gromacs\n

Tip

The Spack developers also provide a website at https://packages.spack.io/ where you can search for and examine packages, including all information on options, versions and dependencies.

When installing a package, Spack will determine what dependencies are required to support it. If they are not already available to Spack, either as packages that it has installed beforehand or as external dependencies, then Spack will also install those, marking them as implicity installed, as opposed to the explicit installation of the package you requested. If you want to see the dependencies of a package before you install it, you can use spack spec to see the full concretised set of packages:

auser@ln01:~> spack spec gromacs@2024.2%gcc+mpi\n

Tip

Spack on ARCHER2 has been configured to use as much of the HPE Cray Programming Environment as possible. For example, this means that Cray LibSci will be used to provide the BLAS, LAPACK and ScaLAPACK dependencies and Cray MPICH will provide MPI. It is also configured to allow it to re-use as dependencies any packages that the ARCHER2 CSE team has spack installed centrally, potentially helping to save you build time and storage quota.

"},{"location":"data-tools/spack/#using-spack-packages","title":"Using Spack packages","text":"

Spack provides a module-like way of making software that you have installed available to use. If you have a GROMACS installation, you can make it available to use with spack load:

auser@ln01:~> spack load gromacs\n

At this point you should be able to use the software as normal. You can then remove it once again from the environment with spack unload:

auser@ln01:~> spack unload gromacs\n

If you have multiple variants of the same package installed, you can use the spec to distinguish between them. You can always check what packages have been installed using the spack find command. If no other arguments are given it will simply list all installed packages, or you can give a package name to narrow it down:

auser@ln01:~> spack find gromacs\n

You can see your packages' install locations using spack find --paths or spack find -p.

"},{"location":"data-tools/spack/#maintaining-your-spack-installations","title":"Maintaining your Spack installations","text":"

In any Spack command that requires as an argument a reference to an installed package, you can provide a hash reference to it rather than its spec. You can see the first part of the hash by running spack find -l, or the full hash with spack find -L. Then use the hash in a command by prefixing it with a forward slash, e.g. wjy5dus becomes /wjy5dus.

If you have two packages installed which appear identical in spack find apart from their hash, you can differentiate them with spack diff:

auser@ln01:~> spack diff /wjy5dus /bleelvs\n

You can uninstall your packages with spack uninstall:

auser@ln01:~> spack uninstall gromacs@2024.2\n

and of course, to be absolutely certain that you are uninstalling the correct package, you can provide the hash:

auser@ln01:~> spack uninstall /wjy5dus\n

Uninstalling a package will leave behind any implicitly installed packages that were installed to support it. Spack may have also installed build-time dependencies that aren't actually needed any more -- these are often packages like autoconf, cmake and m4. You can run the garbage collection command to uninstall any build dependencies and implicit dependencies that are no longer required:

auser@ln01:~> spack gc\n

If you commonly use a set of Spack packages together you may want to consider using a Spack environment to assist you in their installation and management. Please see the Spack documentation for more information.

"},{"location":"data-tools/spack/#custom-configuration","title":"Custom configuration","text":"

Spack is configured using YAML files. The central installation on ARCHER2 made available to users is configured to use the HPE Cray Programming Environment and to allow you to start installing software to your /work directories right away, but if you wish to make any changes you can provide your own overriding userspace configuration.

Your own configuration should fit in the user level scope. On ARCHER2 Spack is configured to, by default, place and look for your configuration files in your work directory at e.g. /work/t01/t01/auser/.spack. You can however override this to have Spack use any directory you choose by setting the SPACK_USER_CONFIG_PATH environment variable, for example:

auser@ln01:~> export SPACK_USER_CONFIG_PATH=/work/t01/t01/auser/spack-config\n

Of course this will need to be a directory where you have write permissions, such in your home or work directories, or in one of your project's shared directories.

You can edit the configuration files directly in a text editor or by running, for example:

auser@ln01:~> spack config edit repos\n

which would open your repos.yaml in vim.

Tip

If you would rather not use vim, you can change which editor is used by Spack by setting the SPACK_EDITOR environment variable.

The final configuration used by Spack is a compound of several scopes, from the Spack defaults which are overridden by the ARCHER2 system configuration files, which can then be overridden in turn by your own configurations. You can see what options are in use at any point by running, for example:

auser@ln01:~> spack config get config\n

which goes through any and all config.yaml files known to Spack and sets the options according to those files' level of precedence. You can also get more information on which files are responsible for which lines in the final active configuration by running, for example to check packages.yaml:

auser@ln01:~> spack config blame packages\n

Unless you have already written a packages.yaml of your own, this will show a mix of options originating from the Spack defaults and also from an archer2-user directory which is where we have told Spack how to use packages from the HPE Cray Programming Environment.

If there is some behaviour in Spack that you want to change, looking at the output of spack config get and spack config blame may help to show what you would need to do. You can then write your own user scope configuration file to set the behaviour you want, which will override the option as set by the lower-level scopes.

Please see the Spack documentation to find out more about writing configuration files.

"},{"location":"data-tools/spack/#writing-new-packages","title":"Writing new packages","text":"

A Spack package is at its core a Python package.py file which provides instructions to Spack on how to obtain source code and compile it. A very simple package will allow it to build just one version with one compiler and one set of options. A more fully-featured package will list more versions and include logic to build them with different compilers and different options, and to also pick its dependencies correctly according to what is chosen.

Spack provides several thousand packages in its builtin repository. You may be able to use these with no issues on ARCHER2 by simply running spack install as described above, but if you do run into problems in the interaction between Spack and the CPE compilers and libraries then you may wish to write your own. Where the ARCHER2 CSE service has encountered problems with packages we have provided our own in a repository located at $SPACK_ROOT/var/spack/repos/archer2.

"},{"location":"data-tools/spack/#creating-your-own-package-repository","title":"Creating your own package repository","text":"

A package repository is a directory containing a repo.yaml configuration file and another directory called packages. Directories within the latter are named for the package they provide, for example cp2k, and contain in turn a package.py. You can create a repository from scratch with the command

auser@ln01:~> spack repo create dirname\n

where dirname is the name of the directory holding the repository. This command will create the directory in your current working directory, but you can choose to instead provide a path to its location. You can then make the new repository available to Spack by running:

auser@ln01:~> spack repo add dirname\n

This adds the path to dirname to the repos.yaml file in your user scope configuration directory as described above. If your repos.yaml doesn't yet exist, it will be created.

A Spack repository can similarly be removed from the config using:

auser@ln01:~> spack repo rm dirname\n
"},{"location":"data-tools/spack/#namespaces-and-repository-priority","title":"Namespaces and repository priority","text":"

A package can exist in several repositories. For example, the Quantum Espresso package is provided by both the builtin repository provided with Spack and also by the archer2 repository; the latter has been patched to work on ARCHER2.

To distinguish between these packages, each repository's packages exist within that repository's namespace. By default the namespace is the same as the name of the directory it was created in, but Spack does allow it to be different. Both builtin and archer2 use the same directory name and namespace.

Tip

If you want your repository namespace to be different from the name of the directory, you can change it either by editing the repository's repo.yaml or by providing an extra argument to spack repo create:

auser@ln01:~> spack repo create dirname namespace\n

Running spack find -N will return the list of installed packages with their namespace. You'll see that they are then prefixed with the repository namespace, for example builtin.bison@3.8.2 and archer2.quantum-espresso@7.2. In order to avoid ambiguity when managing package installation you can always prefix a spec with a repository namespace.

If you don't include the repository in a spec, Spack will search in order all the repositories it has been configured to use until it finds a matching package, which it will then use. The earlier in the list of repositories, the higher the priority. You can check this with:

auser@ln01:~> spack repo list\n

If you run this without having added any repositories of your own, you will see that the two available repositories are archer2 and builtin, in this order. This means that archer2 has higher priority. Because of this, running spack install quantum-espresso would install archer2.quantum-espresso, but you could still choose to install from the other repository with spack install builtin.quantum-espresso.

"},{"location":"data-tools/spack/#creating-a-package","title":"Creating a package","text":"

Once you have a repository of your own in place, you can create new packages to store within it. Spack has a spack create command which will do the initial setup and create a boilerplate package.py. To create an empty package called packagename you would run:

auser@ln01:~> spack create --name packagename\n

However, it will very often be more efficient if you instead provide a download URL for your software as the argument. For example, the Code_Saturne 8.0.3 source is obtained from https://www.code-saturne.org/releases/code_saturne-8.0.3.tar.gz, so you can run:

auser@ln01:~> spack create https://www.code-saturne.org/releases/code_saturne-8.0.3.tar.gz\n

Spack will determine from this the package name, the download URLs for all versions X.Y.Z matching the https://www.code-saturne.org/releases/code_saturne-X.Y.Z.tar.gz pattern. It will then ask you interactively which of these you want to use. Finally, it will download the .tar.gz archives for those versions and calculate their checksums, then place all this information in the initial version of the package for you. This takes away a lot of the initial work!

At this point you can get to work on the package. You can edit an existing package by running

auser@ln01:~> spack edit packagename\n

or by directly opening packagename/package.py within the repository with a text editor.

The boilerplate code will note several sections for you to fill out. If you did provide a source code download URL, you'll also see listed the versions you chose and their checksums.

A package is implemented as a Python class. You'll see that by default it will inherit from the AutotoolsPackage class which defines how a package following the common configure > make > make install process should be built. You can change this to another build system, for example CMakePackage. If you want, you can have the class inherit from several different types of build system classes and choose between them at install time.

Options must be provided to the build. For an AutotoolsPackage package, you can write a configure_args method which very simply returns a list of the command line arguments you would give to configure if you were building the code yourself. There is an identical cmake_args method for CMakePackage packages.

Finally, you will need to provide your package's dependencies. In the main body of your package class you should add calls to the depends_on() function. For example, if your package needs MPI, add depends_on(\"mpi\"). As the argument to the function is a full Spack spec, you can provide any necessary versioning or options, so, for example, if you need PETSc 3.18.0 or newer with Fortran support, you can call depends_on(\"petsc+fortran@3.18.0:\").

If you know that you will only ever want to build a package one way, then providing the build options and dependencies should be all that you need to do. However, if you want to allow for different options as part of the install spec, patch the source code or perform post-install fixes, or take more manual control of the build process, it can become much more complex. Thankfully the Spack developers have provided excellent documentation covering the whole process, and there are many existing packages you can look at to see how it's done.

"},{"location":"data-tools/spack/#tips-when-writing-packages-for-archer2","title":"Tips when writing packages for ARCHER2","text":"

Here are some useful pointers when writing packages for use with the HPE Cray Programming Environment on ARCHER2.

"},{"location":"data-tools/spack/#cray-compiler-wrappers","title":"Cray compiler wrappers","text":"

An important point of note is that Spack does not use the Cray compiler wrappers cc, CC and ftn when compiling code. Instead, it uses the underlying compilers themselves. Remember that the wrappers automate the use of Cray LibSci, Cray FFTW, Cray HDF5 and Cray NetCDF. Without this being done for you, you may need to take extra care to ensure that the options needed to use those libraries are correctly set.

"},{"location":"data-tools/spack/#using-cray-libsci","title":"Using Cray LibSci","text":"

Cray LibSci provides optimised implementations of BLAS, BLACS, LAPACK and ScaLAPACK on ARCHER2. These are bundled together into single libraries named for variants on libsci_cray.so. Although Spack itself knows about LibSci, many applications don't and it can sometimes be tricky to get them to use these libraries when they are instead looking for libblas.so and the like.

The configure or cmake or equivalent step for your software will hopefully allow you to manually point it to the correct library. For example, Code_Saturne's configure can take the options --with-blas-lib and --with-blas-libs which respectively tell it the location to search and the libraries to use in order to build against BLAS.

Spack can provide the correct BLAS library search and link flags to be passed on to configure via self.spec[\"blas\"].libs, a LibraryList object. So, the Code_Saturne package uses the following configure_args() method:

def configure_args(self):\n    blas = self.spec[\"blas\"].libs\n    args = [\"--with-blas-lib={0}\".format(blas.search_flags),\n            \"--with-blas-libs={0}\".format(blas.link_flags)]\n    return args\n

Here the blas.search_flags attribute is resolved to a -L library search flag using the path to the correct LibSci directory, taking into account whether the libraries for the Cray, GCC or AOCC compilers should be used. blas.link_flags similarly gives a -l flag for the correct LibSci library. Depending on what you need, the LibraryList has other attributes which can help you pass the options needed to get configure to find and use the correct library.

"},{"location":"data-tools/spack/#contributing","title":"Contributing","text":"

If you develop a package for use on ARCHER2 please do consider opening a pull request to the GitHub repository.

"},{"location":"data-tools/visidata/","title":"VisiData","text":"

VisiData is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

"},{"location":"data-tools/visidata/#useful-links","title":"Useful links","text":""},{"location":"data-tools/visidata/#visidata-on-archer2","title":"VisiData on ARCHER2","text":"

You can access VisiData on ARCHER2 by loading the visidata module:

module load visidata\n

Once the module has been loaded, VisiData is available via the vd command.

Visidata can also be used in scripts by saving a command log and replaying it. See the VisiData documentation on saving and restoring VisiData sessions.

"},{"location":"data-tools/vmd/","title":"VMD","text":"

VMD is a visualisation program for displaying, animating, and analysing molecular systems using 3D graphics, and built-In tcl/tk scripting.

"},{"location":"data-tools/vmd/#useful-links","title":"Useful links","text":""},{"location":"data-tools/vmd/#using-vmd-on-archer2","title":"Using VMD on ARCHER2","text":"

VMD is available through the vmd module.

module load vmd\n

Once the module has been added the VMD executables, tools, and libraries will be made available.

Without anything else, this allows you to run VMD in \"text-only\" mode with:

vmd -dispdev text\n

If you want to launch VMD with a GUI, see the requirements on the next section.

"},{"location":"data-tools/vmd/#launching-vmd-with-a-gui","title":"Launching VMD with a GUI","text":"

To be able to launch VMD with it's graphical interface, your machine needs to support the x11 \"X windows system\". Most Linux and *NIX systems support this by default. If you're using Windows (through WSL, for example), you will need an X11 display server, we recommend XMing. For macOS, we recommend XQuartz, but please be aware that there's some extra configuration needed, please see the next section

To launch VMD with a GUI, once you have a running X11 display server on your local machine, you'll need to connect to ARCHER2 with X11 forwarding enabled, please follow the instructions in the logging in section. Once you're connected to ARCHER2, load the VMD module with:

module load vmd\n

and launch VMD with:

vmd\n
"},{"location":"data-tools/vmd/#using-vmd-from-macos","title":"Using VMD from macOS","text":"

If you're using macOS and XQuartz, before you're able to launch VMD with a GUI, you will need to change the XQuartz configuration. On a local terminal (that is, not connected to ARCHER2), run the following command:

defaults write org.xquartz.X11 enable_iglx -bool true\n

then, restart XQuartz. You will now be able to launch VMD's GUI without a segmentation fault.

"},{"location":"data-tools/vmd/#compiling-vmd","title":"Compiling VMD","text":"

The latest instructions for building VMD on ARCHER2 may be found in the GitHub repository of build instructions.

"},{"location":"essentials/","title":"Essential Skills","text":"

This section provides information and links on essential skills required to use ARCHER2 efficiently: e.g. using Linux command line, accessing help and documentation.

"},{"location":"essentials/#terminal","title":"Terminal","text":"

In order to access HPC machines such as ARCHER2 you will need to use a Linux command line terminal window

Options for Linux, MacOS and Windows are described under our Connecting to ARCHER2 guide

"},{"location":"essentials/#linux-command-line","title":"Linux Command Line","text":"

A guide to using the Unix Shell for complete novices

For those already familiar with the basics there is also a lesson on shell extras

"},{"location":"essentials/#basic-slurm-commands","title":"Basic Slurm commands","text":"

Slurm is the scheduler used on ARCHER2 and we provide a guide to using the basic Slurm commands including how to find out:

"},{"location":"essentials/#text-editors","title":"Text Editors","text":"

The following text editors are available on ARCHER2

Name Description Examples emacs A widely used editor with a focus on extensibility. emacs -nw sharpen.pbs CTRL+X CTRL+C quits CTRL+X CTRL+S saves nano A small, free editor with a focus on user friendliness. nano sharpen.pbs CTRL+X quits CTRL+O saves vi A mode based editor with a focus on aiding code development. vi cfd.f90 :q in command mode quits :q! in command mode quits without saving :w in command mode saves i in command mode switches to insert mode ESC in insert mode switches to command mode

If you are using MobaXterm on Windows you can use the inbuilt MobaTextEditor text file editor.

You can edit on your local machine using your preferred text editor, and then upload the file to ARCHER2. Make sure you can save the file using Linux line-endings. Notepad, for example, will support Unix/Linux line endings (LF), Macintosh line endings (CR), and Windows Line endings (CRLF)

"},{"location":"essentials/#quick-reference-sheet","title":"Quick Reference Sheet","text":"

We have produced this Quick Reference Sheet which you may find useful.

"},{"location":"faq/","title":"ARCHER2 Frequently Asked Questions","text":"

This section documents some of the questions raised to the Service Desk on ARCHER2, and the advice and solutions.

"},{"location":"faq/#user-accounts","title":"User accounts","text":""},{"location":"faq/#username-already-in-use","title":"Username already in use","text":"

Q. I created a machine account on ARCHER2 for a training course, but now I want to use that machine username for my main ARCHER2 project, and the system will not let me, saying \"that name is already in use\". How can I re-use that username.

A. Send an email to the service desk, letting us know the username and project that you set up previously, and asking for that account and any associated data to be deleted. Once deleted, you can then re-use that username to request an account in your main ARCHER2 project.

"},{"location":"faq/#data","title":"Data","text":""},{"location":"faq/#undeleteable-file-nfsxxxxxxxxxxx","title":"Undeleteable file .nfsXXXXXXXXXXX","text":"

Q. I have a file called .nfsXXXXXXXXXXX (where XXXXXXXXXXX is a long hexadecimal string) in my /home folder but I can't delete it.

A. This file will have been created during a file copy which failed. Trying to delete it will give an error \"Device or resource busy\", even though the copy has ended and no active task is locking it.

echo -n >.nfsXXXXXXXXXXX

will remove it.

"},{"location":"faq/#running-on-archer2","title":"Running on ARCHER2","text":""},{"location":"faq/#oom-error-on-archer2","title":"OOM error on ARCHER2","text":"

Q. Why is my code failing on ARCHER2 with an out of memory (OOM) error?

A. You are requesting too much memory per process. We recommend that you try running the same job on underpopulated nodes. This can be done by editing reducing the --ntasks-per-node in your Slurm submission script. Please lower it to half of its value when it fails (so if you have --ntasks-per-node=128, reduce it to --ntasks-per-node=64).

"},{"location":"faq/#checking-budgets","title":"Checking budgets","text":"

Q. How can I check which budget code(s) I can use?

A. You can check in SAFE by selecting Login accounts from the menu, select the login account you want to query.

Under Login account details you will see each of the budget codes you have access to listed e.g. e123 resources and then under Resource Pool to the right of this, a note of the remaining budget.

When logged in to the machine you can also use the command

sacctmgr show assoc where user=$LOGNAME format=user,Account%12,MaxTRESMins,QOS%40\n

This will list all the budget codes that you have access to (but not the amount of budget available) e.g.

    User      Account  MaxTRESMins                                 QOS\n-------- ------------ ------------ -----------------------------------\n   userx    e123-test                   largescale,long,short,standard\n   userx         e123        cpu=0      largescale,long,short,standard\n

This shows that userx is a member of budgets e123-test and e123. However, the cpu=0 indicates that the e123 budget is empty or disabled. This user can submit jobs using the e123-test budget.

You can only check the amount of available budget via SAFE - see above.

"},{"location":"faq/#estimated-start-time-of-queued-jobs","title":"Estimated start time of queued jobs","text":"

Q. I\u2019ve checked the estimated start time for my queued jobs using \u201csqueue -u $USER --start\u201d. Why does the estimated start time keep changing?

A. ARCHER2 uses the Slurm scheduler to queue jobs for the compute nodes. Slurm attempts to find a better schedule as jobs complete and new jobs are added to the queue. This helps to maximise the use of resources by minimising the number of idle compute nodes, in turn reducing your wait time in the queue.

However, If you periodically check the estimated start time of your queued jobs, you may notice that the estimate changes or even disappears. This is because Slurm only assigns the top entries in the queue with an estimated start time. As the schedule changes, your jobs could move in and out of this top region and thus gain or lose an estimated start time.

"},{"location":"faq/network-upgrade-2023/","title":"ARCHER2 data centre network upgrade: 2023","text":"

During September 2023 the data centre that houses ARCHER2 will be undergoing a major network upgrade.

On this page we describe the impact this will have and links to further information.

If you have any questions or concerns, please contact the ARCHER2 Service Desk.

"},{"location":"faq/network-upgrade-2023/#when-will-the-upgrade-happen-and-how-long-will-it-take","title":"When will the upgrade happen and how long will it take?","text":""},{"location":"faq/network-upgrade-2023/#the-outage-dates-will-be","title":"The outage dates will be:","text":"

We will notify users if we are able to complete this work ahead of schedule and restore ARCHER2 access earlier than expected.

"},{"location":"faq/network-upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":""},{"location":"faq/network-upgrade-2023/#during-the-upgrade-process","title":"During the upgrade process","text":""},{"location":"faq/network-upgrade-2023/#submitting-new-work-and-running-work","title":"Submitting new work, and running work","text":"

We will therefore be encouraging users to submit jobs in the period prior to the work, so that your work can continue on the system during the upgrade process.

"},{"location":"faq/network-upgrade-2023/#relaxing-of-queue-limits","title":"Relaxing of queue limits","text":"

In preparation for the Data Centre Network (DCN) upgrade we have relaxed the queue limits on all the QoS\u2019s, so that users can submit a significantly larger number of jobs to ARCHER2. These changes are intended to allow users to submit jobs that they wish to run during the upgrade, in advance of the start of the upgrade. The changes will be in place until the end of the Data Centre Network upgrade.

For the low priority QoS, as well as relaxing the number of jobs you can submit, we have also increased the maximum job length to 48 hours and the maximum number of nodes per job to 5,860, so users can submit using their own allocation or using the low-priority QoS.

QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 1024 24 hrs 320 16 standard Maximum of 1024 nodes in use by any one user at any time highmem 256 24 hrs 80 16 highmem Maximum of 512 nodes in use by any one user at any time taskfarm 16 24 hrs 640 32 standard Maximum of 256 nodes in use by any one user at any time short 32 20 mins 80 4 standard long 64 48 hrs 80 16 standard Minimum walltime of 24 hrs, maximum 512 nodes in use by any one user at any time, maximum of 2048 nodes in use by QoS largescale 5860 12 hrs 160 1 standard Minimum job size of 1025 nodes lowpriority 5,860 48 hrs 320 16 standard Jobs not charged but requires at least 1 CU in budget to use. serial disabled - - - - - reservation not available - - - -

Can we encourage users to make use of these changes, this is a good opportunity for users to queue and run a greater number of jobs than usual. The relaxation of limits on the low-priority queue also offers an opportunity to run a wider range of jobs through this queue than is normally possible.

Due to the unavailability of the DCN, users will not be able to connect to ARCHER2 via the login nodes during the upgrade. The serial QoS will be disabled during the upgrade period. However, serial jobs can be submitted using the standard and low-priority queues.

"},{"location":"faq/upgrade-2023/","title":"ARCHER2 Upgrade: 2023","text":"

During the first half of 2023 ARCHER went through a major software upgrade.

On this page we describe the background to the changes what impact the changes have had for users, any action you should expect to take following the upgrade and information on the versions on updated software.

If you have any questions or concerns, please contact the ARCHER2 Service Desk.

"},{"location":"faq/upgrade-2023/#why-did-the-upgrade-happen","title":"Why did the upgrade happen?","text":"

There are a number of reasons why ARCHER2 needed to go through this major software upgrade. All of these reasons are related to the fact that the previous system software setup was out of date; due to this, maintenance of the service was very difficult and updating software within the current framework was not possible. Some specific issues were:

"},{"location":"faq/upgrade-2023/#when-did-the-upgrade-happen-and-how-long-did-it-take","title":"When did the upgrade happen and how long did it take?","text":"

This major software upgrade involved a complete re-install of system software followed by a reinstatement of local configurations (e.g. Slurm, authentication services, SAFE integration). Unfortunately, this major work required a long period of downtime but this was planned with all service partners to minimise the outage and give as much notice to users as possible so that they could plan accordingly.

The outage dates were:

"},{"location":"faq/upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":""},{"location":"faq/upgrade-2023/#during-the-upgrade-process","title":"During the upgrade process","text":""},{"location":"faq/upgrade-2023/#after-the-upgrade-process","title":"After the upgrade process","text":"

The allocation periods (where appropriate) were extended for the outage period. The changes were in place when the service was returned.

After the upgrade process there are a number of changes that may require action from users

"},{"location":"faq/upgrade-2023/#updated-login-node-host-keys","title":"Updated login node host keys","text":"

If you previously logged into the ARCHER2 system before the upgrade you may see an error from SSH that looks like:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11).

The current login node host keys are always documented in the User Guide

"},{"location":"faq/upgrade-2023/#recompile-and-test-software","title":"Recompile and test software","text":"

As the new system is based on a new OS version and new versions of compilers and libraries we strongly recommend that all users recompile and test all software on the service. The ARCHER2 CSE service recompiled all centrally installed software.

"},{"location":"faq/upgrade-2023/#no-python-2-installation","title":"No Python 2 installation","text":"

There is no Python 2 installation available as part of supported software following the upgrade. Python 3 continues to be fully-supported.

"},{"location":"faq/upgrade-2023/#impact-on-data-on-the-service","title":"Impact on data on the service","text":""},{"location":"faq/upgrade-2023/#slurm-cpus-per-task-setting-no-longer-inherited-by-srun","title":"Slurm: cpus-per-task setting no longer inherited by srun","text":"

Change in Slurm behaviour. The setting from the --cpus-per-task option to sbatch/salloc is no longer propagated by default to srun commands in the job script.

This can lead to very poor performance due to oversubscription of cores with processes/threads if job submission scripts are not updated. The simplest workaround is to add the command:

export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n

before any srun commands in the script. You can also explicitly use the --cpus-per-task option to srun if you prefer.

"},{"location":"faq/upgrade-2023/#change-of-slurm-socket-definition","title":"Change of Slurm \"socket\" definition","text":"

This change only affects users who use a placement scheme where placement of processes on sockets is cyclic (e.g. --distribution=block:cyclic). The Slurm definition of a \u201csocket\u201d has changed. The previous setting on ARCHER2 was that a socket = 16 cores (all share a DRAM memory controller). On the updated ARCHER2, the setting of a socket = 4 cores (corresponding to a CCX - Core CompleX). Each CCX shares 16 MB L3 Cache.

"},{"location":"faq/upgrade-2023/#changes-to-bind-paths-and-library-paths-for-singularity-with-mpi","title":"Changes to bind paths and library paths for Singularity with MPI","text":"

The paths you need to bind and the LD_LIBRARY_PATH settings required to use Cray MPICH with MPI in Singularity containers have changed. The updated settings are documented in the Containers section of the User and Best Practice Guide. This also includes updated information on building containers with MPI to use on ARCHER2.

"},{"location":"faq/upgrade-2023/#amd-prof-not-available","title":"AMD \u03bcProf not available","text":"

The AMD \u03bcProf tool is not available on the upgraded system yet. We are working to get this fixed as soon as possible.

"},{"location":"faq/upgrade-2023/#what-software-versions-will-be-available-after-the-upgrade","title":"What software versions will be available after the upgrade?","text":"

System software:

"},{"location":"faq/upgrade-2023/#programming-environment-2212","title":"Programming environment: 22.12","text":"

Compilers:

Communication libraries:

Numerical libraries:

IO Libraries:

Tools:

"},{"location":"faq/upgrade-2023/#summary-of-user-and-application-impact-of-pe-software","title":"Summary of user and application impact of PE software","text":"

For full information, see CPE 22.12 Release Notes

CCE 15

C++ applications built using CCE 13 or earlier should be recompiled due to the significant changes that were necessary to implement C++17. This is expected to be a one-time requirement.

Some non-standard Cray Fortran extensions supporting shorthand notation for logical operations will be removed in a future release. CCE 15 will issue warning messages when these are encountered, providing time to adapt the application to use standard Fortran.

HPE Cray MPICH 8.1.23

Cray MPICH 8.1.23 can support only ~2040 simultaneous MPI communicators.

"},{"location":"faq/upgrade-2023/#cse-supported-software","title":"CSE supported software","text":"

Default version in italics

Software Versions CASTEP 22.11, 23.11 Code_Saturne 7.0.1 ChemShell/PyChemShell 3.7.1/21.0.3 CP2K 2023.1 FHI-aims 221103 GROMACS 2022.4 LAMMPS 17_FEB_2023 NAMD 2.14 Nektar++ 5.2.0 NWChem 7.0.2 ONETEP 6.9.1.0 OpenFOAM v10.20230119 (.org), v2212 (.com) Quantum Espresso 6.8, 7.1 VASP 5.4.4.pl2, 6.3.2, 6.4.1-vtst, 6.4.1 Software Versions AOCL 3.1, 4.0 Boost 1.81.0 GSL 2.7 HYPRE 2.18.0, 2.25.0 METIS/ParMETIS 5.1.0/4.0.3 MUMPS 5.3.5, 5.5.1 PETSc 13.14.2, 13.18.5 PT/Scotch 6.1.0, 07.0.3 SLEPC 13.14.1, 13.18.3 SuperLU/SuperLU_Dist 5.2.2 / 6.4.0, 8.1.2 Trilinos 12.18.1"},{"location":"known-issues/","title":"ARCHER2 Known Issues","text":"

This section highlights known issues on ARCHER2, their potential impacts and any known workarounds. Many of these issues are under active investigation by HPE Cray and the wider service.

Info

This page was last reviewed on 9 November 2023

"},{"location":"known-issues/#open-issues","title":"Open Issues","text":""},{"location":"known-issues/#atp-module-tries-to-write-to-home-from-compute-nodes-added-2024-04-29","title":"ATP Module tries to write to /home from compute nodes (Added: 2024-04-29)","text":"

The ATP Module tries to execute a mkdir command in the /home filesystem. When running the ATP module on the compute nodes, this will lead to an error, as the compute nodes cannot access the /home filesystem.

To circumvent the error, add the line:

export HOME=${HOME/home/work}\n

in the slurm script, so that the ATP module will write to /work instead.

"},{"location":"known-issues/#when-close-to-storage-quota-jobs-may-slow-down-or-produce-corrupted-files-added-2024-02-27","title":"When close to storage quota, jobs may slow down or produce corrupted files (Added: 2024-02-27)","text":"

For situations where users are close to user or project quotas on work (Lustre) file systems we have seen cases of the following behaviour:

If you see these symptoms: slower than expected performance, data corruption; then you should check if you are close to your storage quota (either user or project quota). If you are, you may be experiencing this issue. Either remove data to free up space or request more storage quota.

"},{"location":"known-issues/#e-mail-alerts-from-slurm-do-not-work-added-2023-11-09","title":"e-mail alerts from Slurm do not work (Added: 2023-11-09)","text":"

Email alerts from Slurm (--mail-type and --mail-user options) do not produce emails to users. We are investigating with Universtiy of Edinburgh Information Services to enable this Slurm feature in the future.

"},{"location":"known-issues/#excessive-memory-use-when-using-ucx-communications-protocol-added-2023-07-20","title":"Excessive memory use when using UCX communications protocol (Added: 2023-07-20)","text":"

We have seen cases when using the (non-default) UCX communications protocol where the peak in memory use is much higher than would be expected. This leads to jobs failing unexpectedly with an OOM (Out Of Memory) error. The workaround is to use Open Fabrics (OFI) communication protocol instead. OFI is the default protocol on ARCHER2 and so does not usually need to be explicitly loaded; but if you have UCX loaded, you can switch to OFI by adding the following lines to your submission script before you run your application:

module load craype-network-ofi\nmodule load cray-mpich\n

It can be very useful to track the memory usage of your job as it runs, for example to see whether there is high usage on all nodes, or a single node, if usage increases gradually or rapidly etc.

Here are instructions on how to do this using a couple of small scripts.

"},{"location":"known-issues/#slurm-cpu-freqx-option-is-not-respected-when-used-with-sbatch-added-2023-01-18","title":"Slurm --cpu-freq=X option is not respected when used with sbatch (Added: 2023-01-18)","text":"

If you specify the CPU frequency using the --cpu-freq option with the sbatch command (either using the script #SBATCH --cpu-freq=X method or the --cpu-freq=X option directly) then this option will not be respected as the default setting for ARCHER2 (2.0 GHz) will override the option. You should specify the --cpu-freq option to srun directly instead within the job submission script. i.e.:

srun --cpu-freq=2250000 ...\n

You can find more information on setting the CPU frequency in the User Guide.

"},{"location":"known-issues/#research-software","title":"Research Software","text":"

There are several outstanding issues for the centrally installed Research Software:

Users should also check individual software pages, for known limitations/ caveats, for the use of software on the Cray EX platform and Cray Linux Environment.

"},{"location":"known-issues/#issues-with-rpath-for-non-default-library-versions","title":"Issues with RPATH for non-default library versions","text":"

When you compile applications against non-default versions of libraries within the HPE Cray software stack and use the environment variable CRAY_ADD_RPATH=yes to try and encode the paths to these libraries within the binary this will not be respected at runtime and the binaries will use the default versions instead.

The workaround for this issue is to ensure that you set:

export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

at both compile and runtime. For more details on using non-default versions of libraries, see the description in the User and Best Practice Guide

"},{"location":"known-issues/#mpi-ucx-error-ivb_reg_mr","title":"MPI UCX ERROR: ivb_reg_mr","text":"

If you are using the UCX layer for MPI communication you may see an error such as:

[1613401128.440695] [nid001192:11838:0] ib_md.c:325 UCX ERROR ibv_reg_mr(address=0xabcf12c0, length=26400, access=0xf) failed: Cannot allocate memory\n[1613401128.440768] [nid001192:11838:0] ucp_mm.c:137 UCX ERROR failed to register address 0xabcf12c0 mem_type bit 0x1 length 26400 on md[4]=mlx5_0: Input/output error (md reg_mem_types 0x15)\n[1613401128.440773] [nid001192:11838:0] ucp_request.c:269 UCX ERROR failed to register user buffer datatype 0x8 address 0xabcf12c0 len 26400: Input/output error\nMPICH ERROR [Rank 1534] [job id 114930.0] [Mon Feb 15 14:58:48 2021] [unknown] [nid001192] - Abort(672797967) (rank 1534 in comm 0): Fatal error in PMPI_Isend: Other MPI error, error stack:\nPMPI_Isend(160)......: MPI_Isend(buf=0xabcf12c0, count=3300, MPI_DOUBLE_PRECISION, dest=1612, tag=4, comm=0x84000004, request=0x7fffb38fa0fc) failed\nMPID_Isend(416)......:\nMPID_isend_unsafe(92):\nMPIDI_UCX_send(95)...: returned failed request in UCX netmod(ucx_send.h 95 MPIDI_UCX_send Input/output error)\naborting job:\nFatal error in PMPI_Isend: Other MPI error, error stack:\nPMPI_Isend(160)......: MPI_Isend(buf=0xabcf12c0, count=3300, MPI_DOUBLE_PRECISION, dest=1612, tag=4, comm=0x84000004, request=0x7fffb38fa0fc) failed\nMPID_Isend(416)......:\nMPID_isend_unsafe(92):\nMPIDI_UCX_send(95)...: returned failed request in UCX netmod(ucx_send.h 95 MPIDI_UCX_send Input/output error)\n[1613401128.457254] [nid001192:11838:0] mm_xpmem.c:82 UCX WARN remote segment id 200002e09 apid 200002e3e is not released, refcount 1\n[1613401128.457261] [nid001192:11838:0] mm_xpmem.c:82 UCX WARN remote segment id 200002e08 apid 100002e3e is not released, refcount 1\n

You can add the following line to your job submission script before the srun command to try and workaround this error:

export UCX_IB_REG_METHODS=direct\n

Note

Setting this flag may have an impact on code performance.

"},{"location":"known-issues/#aocc-compiler-fails-to-compile-with-netcdf-added-2021-11-18","title":"AOCC compiler fails to compile with NetCDF (Added: 2021-11-18)","text":"

There is currently a problem with the module file which means cray-netcdf-hdf5parallel will not operate correctly in PrgEnv-aocc. An example of the error seen is:

F90-F-0004-Corrupt or Old Module file /opt/cray/pe/netcdf-hdf5parallel/4.7.4.3/crayclang/9.1/include/netcdf.mod (netcdf.F90: 8)\n

The current workaround for this is to load module epcc-netcdf-hdf5parallel instead if PrgEnv-aocc is required.

"},{"location":"known-issues/#slurm-export-option-does-not-work-in-job-submission-script","title":"Slurm --export option does not work in job submission script","text":"

The option --export=ALL propagates all the environment variables from the login node to the compute node. If you include the option in the job submission script, it is wrongly ignored by Slurm. The current workaround is to include the option when the job submission script is launched. For instance:

sbatch --export=ALL myjob.slurm\n
"},{"location":"known-issues/#recently-resolved-issues","title":"Recently Resolved Issues","text":""},{"location":"other-software/","title":"Software provided by external parties","text":"

This section describes software that has been installed on ARCHER2 by external parties (i.e. not by the ARCHER2 service itself) for general use by ARCHER2 users or provides useful notes on software that is not installed centrally.

Important

While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"other-software/#research-software","title":"Research Software","text":""},{"location":"other-software/casino/","title":"Casino","text":"

This page has moved

"},{"location":"other-software/cesm-further-examples/","title":"Cesm further examples","text":"

This page has moved

"},{"location":"other-software/cesm213/","title":"Cesm213","text":"

This page has moved

"},{"location":"other-software/cesm213_run/","title":"Cesm213 run","text":"

This page has moved

"},{"location":"other-software/cesm213_setup/","title":"Cesm213 setup","text":"

This page has moved

"},{"location":"other-software/crystal/","title":"Crystal","text":"

This page has moved

"},{"location":"publish/","title":"ARCHER2 and publications","text":"

This section provides information on how to acknowledge the use of ARCHER2 in your published work and how to register your work on ARCHER2 into the ARCHER2 publications database via SAFE.

"},{"location":"publish/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

We will shortly be publishing a description of the ARCHER2 service with a DOI that you can cite in your published work that arises from the use of ARCHER2. Until that time, please add the following words to any work you publish that arises from your use of ARCHER2:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"publish/#archer2-publication-database","title":"ARCHER2 publication database","text":"

The ARCHER2 service maintains a publication database of works that have arisen from ARCHER2 and links them to project IDs that have ARCHER2 access. We ask all users of ARCHER2 to register any publications in the database - all you need is your publication's DOI.

Registering your publications in SAFE has a number of advantages:

"},{"location":"publish/#how-to-register-a-publication-in-the-database","title":"How to register a publication in the database","text":"

You will need a DOI for the publication you wish to register. A DOI has the form of an set of ID strings separated by slashes. For example, 10.7488/ds/1505, you should not include the web host address which provides a link to the DOI.

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to associate the publication with from the list and click View.
  3. The next page will list currently registered publications, to add one click Add.
  4. Enter the DOI in the text field provided and click Add
"},{"location":"publish/#how-to-list-your-publications","title":"How to list your publications","text":"

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to list the publications from using the dropdown menu and click View.
  3. The next page will list your currently registered publications.
"},{"location":"publish/#how-to-export-your-publications","title":"How to export your publications","text":"

At the moment we support export lists of DOIs to comma-separated values (CSV) files. This does not export all the metadata, just the DOIs themselves with a maximum of 25 DOIs per line. This format is primarily useful for importing into ResearchFish (where you can paste in the comma-separated lists to import publications). We plan to add further export formats in the future.

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to list the publications from using the dropdown menu and click View.
  3. The next page will list your currently registered publications.
  4. Click Export to generate a plain text comma-separated values (CSV) file that lists all DOIs.
  5. If required, you can save this file using the Save command your web browser.
"},{"location":"quick-start/overview/","title":"Quickstart","text":"

The ARCHER2 quickstart guides provide the minimum information for new users or users transferring from ARCHER. There are two sections available which are meant to be followed in sequence.

"},{"location":"quick-start/quickstart-developers/","title":"Quickstart for developers","text":"

This guide aims to quickly enable developers to work on ARCHER2. It assumes that you are familiar with the material in the Quickstart for users section.

"},{"location":"quick-start/quickstart-developers/#compiler-wrappers","title":"Compiler wrappers","text":"

When compiling code on ARCHER2, you should make use of the HPE Cray compiler wrappers. These ensure that the correct libraries and headers (for example, MPI or HPE LibSci) will be used during the compilation and linking stages. These wrappers should be accessed by providing the following compiler names.

Language Wrapper name C cc C++ CC Fortran ftn

This means that you should use the wrapper names whether on the command line, in build scripts, or in configure options. It could be helpful to set some or all of the following environment variables before running a build to ensure that the build tool is aware of the wrappers.

export CC=cc\nexport CXX=CC\nexport FC=ftn\nexport F77=ftn\nexport F90=ftn\n

man pages are available for each wrapper. You can also see the full set of compiler and linker options being used by passing the -craype-verbose option to the wrapper.

Tip

The HPE Cray compiler wrappers should be used instead of the MPI compiler wrappers such as mpicc, mpicxx and mpif90 that you may have used on other HPC systems.

"},{"location":"quick-start/quickstart-developers/#programming-environments","title":"Programming environments","text":"

On login to ARCHER2, the PrgEnv-cray compiler environment will be loaded, as will a cce module. The latter makes available the Cray compilers from the Cray Compiling Environment (CCE), while the former provides the correct wrappers and support to use them. The GNU Compiler Collection (GCC) and the AMD compiler environment (AOCC) are also available.

To make use of any particular compiler environment, you load the correct PrgEnv module. After doing so the compiler wrappers (cc, CC and ftn) will correctly call the compilers from the new suite. The default version of the corresponding compiler suite will also be loaded, but you may swap to another available version if you wish.

The following table summarises the suites and associated compiler environments.

Suite name Module Programming environment collection CCE cce PrgEnv-cray GCC gcc PrgEnv-gnu AOCC aocc PrgEnv-aocc

As an example, after logging in you may wish to use GCC as your compiler suite. Running module load PrgEnv-gnu will replace the default CCE (Cray) environment with the GNU environment. It will also unload the cce module and load the default version of the gcc module; at the time of writing, this is GCC 11.2.0. If you need to use a different version of GCC, for example 10.3.0, you would follow up with module load gcc/10.3.0. At this point you may invoke the compiler wrappers and they will correctly use the HPE libraries and tools in conjunction with GCC 10.3.0.

When choosing the compiler environment, a big factor will likely be which compilers you have previously used for your code's development. The Cray Fortran compiler is similar to the compiler you may be familiar with from ARCHER, while the Cray C and C++ compilers provided on ARCHER2 are new versions that are now derived from Clang. The GCC suite provides gcc/g++ and gfortran. The AOCC suite provides AMD Clang/Clang++ and AMD Flang.

Note

The Intel compilers are not available on ARCHER2.

"},{"location":"quick-start/quickstart-developers/#useful-compiler-options","title":"Useful compiler options","text":"

The compiler options you use will depend on both the software you are building and also on the current stage of development. The following flags should be a good starting point for reasonable performance.

Compilers Optimisation flags Cray C/C++ -O2 -funroll-loops -ffast-math Cray Fortran Default options GCC -O2 -ftree-vectorize -funroll-loops -ffast-math

Tip

If you want to use GCC version 10 or greater to compile MPI Fortran code, you must add the -fallow-argument-mismatch option when compiling otherwise you will see compile errors associated with MPI functions.

When you are happy with your code's performance you may wish to enable more aggressive optimisations; in this case you could start using the following flags. Please note, however, that these optimisations may lead to deviations from IEEE/ISO specifications. If your code relies on strict adherence then using these flags may cause incorrect output.

Compilers Optimisation flags Cray C/C++ -Ofast -funroll-loops Cray Fortran -O3 -hfp3 GCC -Ofast -funroll-loops

Vectorisation is enabled by the Cray Fortran compiler at -O1 and above, by Cray C and C++ at -O2 and above or when using -ftree-vectorize, and by the GCC compilers at -O3 and above or when using -ftree-vectorize.

You may wish to promote default real and integer types in Fortran codes from 4 to 8 bytes. In this case, the following flags may be used.

Compiler Fortran real and integer promotion flags Cray Fortran -s real64 -s integer64 gfortran -freal-4-real-8 -finteger-4-integer-8

More documentation on the compilers is available through man. The pages to read are accessed as follow.

Compiler suite C C++ Fortran Cray man craycc man crayCC man crayftn GNU man gcc man g++ man gfortran

Tip

There are no man pages for the AOCC compilers at the moment.

"},{"location":"quick-start/quickstart-developers/#linking-on-archer2","title":"Linking on ARCHER2","text":"

Executables on ARCHER2 link dynamically, and the Cray Programming Environment does not currently support static linking. This is in contrast to ARCHER where the default was to build statically.

If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

The compiler wrapper scripts on ARCHER link runtime libraries in using the RUNPATH by default. This means that the paths to the runtime libraries are encoded into the executable so you do not need to load the compiler environment in your job submission scripts.

"},{"location":"quick-start/quickstart-developers/#using-runpaths-to-link","title":"Using RUNPATHs to link","text":"

The default behaviour of a dynamically linked executable will be to allow the linker to provide the libraries it needs at runtime by searching the paths in the LD_LIBRARY_PATH environment and then by searching the paths in the RUNPATH variable setting of the binary. This is flexible in that it allows an executable to use newly installed library versions without rebuilding, but in some cases you may prefer to bake the paths to specific libraries into the executable RUNPATH, keeping them constant. While the libraries are still dynamically loaded at run time, from the end user's point of view the resulting behaviour will be similar to that of a statically compiled executable in that they will not need to concern themselves with ensuring the linker will be able to find the libraries.

This is achieved by providing additional paths to add to RUNPATH to the compiler as options. To set the compiler wrappers to do this, you can set the following environment variable.

export CRAY_ADD_RPATH=yes\n
"},{"location":"quick-start/quickstart-developers/#using-rpaths-to-link","title":"Using RPATHs to link","text":"

RPATH differs from RUNPATH in that it searches RPATH directories for libraries before searching the paths in LD_LIBRARY_PATH so they cannot be overridden in the same way at runtime.

You can provide RPATHs directly to the compilers using the -Wl,-rpath=<path-to-directory> flag, where the provided path is to the directory containing the libraries which are themselves typically specified with flags of the type -l<library-name>.

"},{"location":"quick-start/quickstart-developers/#debugging-tools","title":"Debugging tools","text":"

The following debugging tools are available on ARCHER2:

To get started debugging on ARCHER2, you might like to use gdb4hpc. You should first of all compile your code using the -g flag to enable debugging symbols. Once compiled, load the gdb4hpc module and start it:

module load gdb4hpc\ngdb4hpc\n

Once inside gdb4hpc, you can start your program's execution with the launch command:

dbg all> launch $my_prog{128} ./prog\n

In this example, a job called my_prog will be launched to run the executable file prog over 128 cores on a compute node. If you run squeue in another terminal you will be able to see it running. Inside gdb4hpc you may then step through the code's execution, continue to breakpoints that you set with break, print the values of variables at these points, and perform a backtrace on the stack if the program crashes. Debugging jobs will end when you exit gdb4hpc, or you can end them yourself by running, in this example, release $my_prog.

For more information on debugging parallel codes, see the documentation in the Debugging section of the ARCHER2 User and Best Practice Guide.

"},{"location":"quick-start/quickstart-developers/#profiling-tools","title":"Profiling tools","text":"

Profiling on ARCHER2 is provided through the Cray Performance Measurement and Analysis Tools (CrayPAT). This has a number of different components:

The above tools are made available for use by firstly loading the perftools-base module followed by either perftools (for CrayPAT, Reveal and Apprentice2) or one of the perftools-lite modules.

The simplest way to get started profiling your code is with CrayPAT-lite. For example, to sample a run of a code you would load the perftools-base and perftools-lite modules, and then compile (you will receive a message that the executable is being instrumented). Performing a batch run as usual with this executable will produce a directory such as my_prog+74653-2s which can be passed to pat_report to view the results. In this example,

pat_report -O calltree+src my_prog+74653-2s\n

will produce a report containing the call tree. You can view available report keywords to be provided to the -O option by running pat_report -O -h. The available perftools-lite modules are:

Tip

For more information on profiling parallel codes, see the documentation in the Profiling section of the ARCHER2 User and Best Practice Guide.

"},{"location":"quick-start/quickstart-developers/#useful-links","title":"Useful Links","text":"

Links to other documentation you may find useful:

"},{"location":"quick-start/quickstart-next-steps/","title":"Next Steps","text":"

Once you have set up your machine account and logged on, run a job or two and possibly updated and compiled your code: what next?

There is still loads of support and advice available to you:

Getting Started on ARCHER2 gives an overview of some of this help.

Advice on how to Get Access with different funding routes, and if your chosen route requires you to complete a Technical Assessment, we have advice on How to prepare a successful TA

And we also have a comprehensive Training Programme for all levels of experience and a wide range of different uses. All our training is free for UK Academics and we have a list of upcoming training and also all the materials and resources from previous training events.

"},{"location":"quick-start/quickstart-users-totp/","title":"Quickstart for users","text":"

This guide aims to quickly enable new users to get up and running on ARCHER2. It covers the process of getting an ARCHER2 account, logging in and running your first job.

"},{"location":"quick-start/quickstart-users-totp/#request-an-account-on-archer2","title":"Request an account on ARCHER2","text":"

Important

You need to use both a password and a passphrase-protected SSH key pair to log into ARCHER2. You get the password from SAFE, but, you will also need to setup your own SSH key pair and add the public part to your account via SAFE before you will be able to log in. We cover the authentication steps below.

"},{"location":"quick-start/quickstart-users-totp/#obtain-an-account-on-the-safe-website","title":"Obtain an account on the SAFE website","text":"

Warning

We have seen issues with Gmail blocking emails from SAFE so we recommend that users use their institutional/work email address rather than Gmail addresses to register for SAFE accounts.

The first step is to sign up for an account on the ARCHER2 SAFE website. The SAFE account is used to manage all of your login accounts, allowing you to report on your usage and quotas. To do this:

  1. Go to the SAFE New User Signup Form
  2. Fill in your personal details. You can come back later and change them if you wish
  3. Click Submit

You are now registered. Your SAFE password will be emailed to the email address you provided. You can then login with that email address and password. (You can change your initial SAFE password whenever you want by selecting the Change SAFE password option from the Your details menu.)

"},{"location":"quick-start/quickstart-users-totp/#request-an-archer2-login-account","title":"Request an ARCHER2 login account","text":"

Once you have a SAFE account and an SSH key you will need to request a user account on ARCHER2 itself. To do this you will require a Project Code; you usually obtain this from the Principle Investigator (PI) or project manager for the project you will be working on. Once you have the Project Code:

Full system
  1. Log into SAFE
  2. Use the Login accounts - Request new account menu item
  3. Select the correct project from the drop down list
  4. Select the archer2 machine in the list of available machines
  5. Click Next
  6. Enter a username for the account and the public part of an SSH key pair
    1. More information on generating SSH key pair can be found in the ARCHER2 User and Best Practice Guide
    2. You can add additional SSH keys using the process described below if you so wish.
  7. Click Request

The PI or project manager of the project will be asked to approve your request. After your request has been approved the account will be created and when this has been done you will receive an email. You can then come back to SAFE and pick up the initial single-use password for your new account.

Note

ARCHER2 account passwords are also sometimes referred to as LDAP passwords by the system.

"},{"location":"quick-start/quickstart-users-totp/#generating-and-adding-an-ssh-key-pair","title":"Generating and adding an SSH key pair","text":"

How you generate your SSH key pair depends on which operating system you use and which SSH client you use to connect to ARCHER2. We will not cover the details on generating an SSH key pair here, but detailed information on this topic is available in the ARCHER2 User and Best Practice Guide.

After generating your SSH key pair, add the public part to your login account using SAFE:

  1. Log into SAFE
  2. Use the menu Login accounts and select the ARCHER2 account to be associated with the SSH key
  3. On the subsequent Login account details page, click the Add Credential button
  4. Select SSH public key as the Credential Type and click Next
  5. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer
  6. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your ARCHER2 account.

Remember, you will need to use both an SSH key and password to log into ARCHER2 so you will also need to collect your initial password before you can log into ARCHER2 for the first time. We cover this next.

Note

If you want to connect to ARCHER2 from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.

"},{"location":"quick-start/quickstart-users-totp/#login-to-archer2","title":"Login to ARCHER2","text":"

To log into ARCHER2 you should use the address:

Full system

ssh [userID]@login.archer2.ac.uk

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you previously logged into the ARCHER2 system before the major upgrade in May/June 2023 with your account you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Tip

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_archer2 you would use the command ssh -i keys/id_rsa_archer2 username@login.archer2.ac.uk to log in.

"},{"location":"quick-start/quickstart-users-totp/#mfa-time-based-one-time-password","title":"MFA Time-based one-time password","text":"

Remember, you will need to use both an SSH key and Time-based one-time password to log into ARCHER2 so you will also need to set up your TOTP before you can log into ARCHER2.

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed You will not use your password when logging on to ARCHER2 after the initial logon.

Hint

More information on connecting to ARCHER2 is available in the Connecting to ARCHER2 section of the User Guide.

"},{"location":"quick-start/quickstart-users-totp/#file-systems-and-manipulating-data","title":"File systems and manipulating data","text":"

ARCHER2 has a number of different file systems and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.

ARCHER2 file systems are:

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

Top tips for managing data on ARCHER2:

Hint

Information on the file systems and best practice in managing you data is available in the Data management and transfer section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users-totp/#accessing-software","title":"Accessing software","text":"

Software on ARCHER2 is principally accessed through modules. These load and unload the desired applications, compilers, tools and libraries through the module command and its subcommands. Some modules will be loaded by default on login, providing a default working environment; many more will be available for use but initially unloaded, allowing you to set up the environment to suit your needs.

At any stage you can check which modules have been loaded by running

module list\n

Running the following command will display all environment modules available on ARCHER2, whether loaded or unloaded

module avail\n

The search field for this command may be narrowed by providing the first few characters of the module name being queried. For example, all available versions and variants of VASP may be found by running

module avail vasp\n

You will see that different versions are available for many modules. For example, vasp/5/5.4.4.pl2 and vasp/6/6.3.2 are two available versions of VASP on the full system. Furthermore, a default version may be specified; this is used if no version is provided by the user.

Important

VASP is licensed software, as are other software packages on ARCHER2. You must have a valid licence to use licensed software on ARCHER2. Often you will need to request access through the SAFE. More on this below.

The module load command loads a module for use. Following the above,

module load vasp/6\n

would load the default version of VASP 6, while

module load vasp/6/6.3.2\n

would specifically load version 6.3.2. A loaded module may be unloaded through the identical module remove command, e.g.

module unload vasp\n

The above unloads whichever version of VASP is currently in the environment. Rather than issuing separate unload and load commands, versions of a module may be swapped as follows:

module swap vasp vasp/5/5.4.4.pl2\n

Other helpful commands are:

Tip

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

Points to be aware of include:

More information on modules and the software environment on ARCHER2 can be found in the Software environment section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users-totp/#requesting-access-to-licensed-software","title":"Requesting access to licensed software","text":"

Some of the software installed on ARCHER2 requires a user to have a valid licence agreed with the software owners/developers to be able to use it (for example, VASP). Although you will be able to load this software on ARCHER2, you will be barred from actually using it until your licence has been verified.

You request access to licensed software through the SAFE (the web administration tool you used to apply for your account and retrieve your initial password) by being added to the appropriate Package Group. To request access to licensed software:

  1. Log in to SAFE
  2. Go to the Menu Login accounts and select the login account which requires access to the software
  3. Click New Package Group Request
  4. Select the software from the list of available packages and click Select Package Group
  5. Fill in as much information as possible about your license; at the very least provide the information requested at the top of the screen such as the licence holder's name and contact details. If you are covered by the license because the licence holder is your supervisor, for example, please state this.
  6. Click Submit

Your request will then be processed by the ARCHER2 Service Desk who will confirm your license with the software owners/developers before enabling your access to the software on ARCHER2. This can take several days (depending on how quickly the software owners/developers take to respond) but you will be advised once this has been done.

"},{"location":"quick-start/quickstart-users-totp/#create-a-job-submission-script","title":"Create a job submission script","text":"

To run a program on the ARCHER2 compute nodes you need to write a job submission script that tells the system how many compute nodes you want to reserve and for how long. You also need to use the srun command to launch your parallel executable.

Hint

For a more details on the Slurm scheduler on ARCHER2 and writing job submission scripts see the Running jobs on ARCHER2 section of the User and Best Practice Guide.

Important

Parallel jobs on ARCHER2 should be run from the work file systems as the home file systems are not available on the compute nodes - you will see a chdir or file not found error if you try to access data on the home file system within a parallel job running on the compute nodes.

Create a job submission script called submit.slurm in your space on the work file systems using your favourite text editor. For example, using vim:

auser@ln01:~> cd /work/t01/t01/auser\nauser@ln01:/work/t01/t01/auser> vim submit.slurm\n

Tip

You will need to use your project code and username to get to the correct directory. i.e. replace the t01 above with your project code and replace the username auser with your ARCHER2 username.

Paste the following text into your job submission script, replacing ENTER_YOUR_BUDGET_CODE_HERE with your budget code e.g. e99-ham, ENTER_PARTITION_HERE with the partition you wish to run on (e.g standard), and ENTER_QOS_HERE with the quality of service you want (e.g. standard).

Full system
#!/bin/bash --login\n\n#SBATCH --job-name=test_job\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:5:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module to get access to the xthi program\nmodule load xthi\n\n# Recommended environment settings\n# Stop unintentional multi-threading within software libraries\nexport OMP_NUM_THREADS=1\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# srun launches the parallel program based on the SBATCH options\nsrun --distribution=block:block --hint=nomultithread xthi_mpi\n
"},{"location":"quick-start/quickstart-users-totp/#submit-your-job-to-the-queue","title":"Submit your job to the queue","text":"

You submit your job to the queues using the sbatch command:

auser@ln01:/work/t01/t01/auser> sbatch submit.slurm\nSubmitted batch job 23996\n\nThe value returned is your *Job ID*.\n
"},{"location":"quick-start/quickstart-users-totp/#monitoring-your-job","title":"Monitoring your job","text":"

You use the squeue command to examine jobs in the queue. To list all the jobs you have in the queue, use:

auser@ln01:/work/t01/t01/auser> squeue -u $USER\n

squeue on its own lists all jobs in the queue from all users.

"},{"location":"quick-start/quickstart-users-totp/#checking-the-output-from-the-job","title":"Checking the output from the job","text":"

The job submission script above should write the output to a file called slurm-<jobID>.out (i.e. if the Job ID was 23996, the file would be slurm-23996.out), you can check the contents of this file with the cat command. If the job was successful you should see output that looks something like:

auser@ln01:/work/t01/t01/auser> cat slurm-23996.out\nNode    0, hostname nid001020\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    0, rank    8, thread   0, (affinity =    8)\nNode    0, rank    9, thread   0, (affinity =    9)\nNode    0, rank   10, thread   0, (affinity =   10)\nNode    0, rank   11, thread   0, (affinity =   11)\nNode    0, rank   12, thread   0, (affinity =   12)\nNode    0, rank   13, thread   0, (affinity =   13)\nNode    0, rank   14, thread   0, (affinity =   14)\nNode    0, rank   15, thread   0, (affinity =   15)\nNode    0, rank   16, thread   0, (affinity =   16)\nNode    0, rank   17, thread   0, (affinity =   17)\nNode    0, rank   18, thread   0, (affinity =   18)\nNode    0, rank   19, thread   0, (affinity =   19)\nNode    0, rank   20, thread   0, (affinity =   20)\nNode    0, rank   21, thread   0, (affinity =   21)\n... output trimmed ...\n

If something has gone wrong, you will find any error messages in the file instead of the expected output.

"},{"location":"quick-start/quickstart-users-totp/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

You should use the following phrase to acknowledge ARCHER2 for all research outputs that were generated using the ARCHER2 service:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"quick-start/quickstart-users-totp/#useful-links","title":"Useful Links","text":"

If you plan to compile your own programs on ARCHER2, you may also want to look at Quickstart for developers.

Other documentation you may find useful:

"},{"location":"quick-start/quickstart-users/","title":"Quickstart for users","text":"

This guide aims to quickly enable new users to get up and running on ARCHER2. It covers the process of getting an ARCHER2 account, logging in and running your first job.

"},{"location":"quick-start/quickstart-users/#request-an-account-on-archer2","title":"Request an account on ARCHER2","text":"

Important

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password (TOTP). Additionally, the first time you ever log into an account on ARCHER2, you will need to use a single use password you retrieve from SAFE.

"},{"location":"quick-start/quickstart-users/#obtain-an-account-on-the-safe-website","title":"Obtain an account on the SAFE website","text":"

Warning

We have seen issues with Gmail blocking emails from SAFE so we recommend that users use their institutional/work email address rather than Gmail addresses to register for SAFE accounts.

The first step is to sign up for an account on the ARCHER2 SAFE website. The SAFE account is used to manage all of your login accounts, allowing you to report on your usage and quotas. To do this:

  1. Go to the SAFE New User Signup Form
  2. Fill in your personal details. You can come back later and change them if you wish
  3. Click Submit

You are now registered. Your SAFE password will be emailed to the email address you provided. You can then login with that email address and password. (You can change your initial SAFE password whenever you want by selecting the Change SAFE password option from the Your details menu.)

"},{"location":"quick-start/quickstart-users/#request-an-archer2-login-account","title":"Request an ARCHER2 login account","text":"

Once you have a SAFE account and an SSH key you will need to request a user account on ARCHER2 itself. To do this you will require a Project Code; you usually obtain this from the Principle Investigator (PI) or project manager for the project you will be working on. Once you have the Project Code:

Full system
  1. Log into SAFE
  2. Use the Login accounts - Request new account menu item
  3. Select the correct project from the drop down list
  4. Select the archer2 machine in the list of available machines
  5. Click Next
  6. Enter a username for the account and the public part of an SSH key pair
    1. More information on generating SSH key pair can be found in the ARCHER2 User and Best Practice Guide
    2. You can add additional SSH keys using the process described below if you so wish.
  7. Click Request

The PI or project manager of the project will be asked to approve your request. After your request has been approved the account will be created and when this has been done you will receive an email. You can then come back to SAFE and pick up the initial single-use password for your new account.

Note

ARCHER2 account passwords are also sometimes referred to as LDAP passwords by the system.

"},{"location":"quick-start/quickstart-users/#generating-and-adding-an-ssh-key-pair","title":"Generating and adding an SSH key pair","text":"

How you generate your SSH key pair depends on which operating system you use and which SSH client you use to connect to ARCHER2. We will not cover the details on generating an SSH key pair here, but detailed information on this topic is available in the ARCHER2 User and Best Practice Guide.

After generating your SSH key pair, add the public part to your login account using SAFE:

  1. Log into SAFE
  2. Use the menu Login accounts and select the ARCHER2 account to be associated with the SSH key
  3. On the subsequent Login account details page, click the Add Credential button
  4. Select SSH public key as the Credential Type and click Next
  5. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer
  6. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your ARCHER2 account.

Remember, you will need to use both an SSH key and password to log into ARCHER2 so you will also need to collect your initial password before you can log into ARCHER2 for the first time. We cover this next.

Note

If you want to connect to ARCHER2 from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.

"},{"location":"quick-start/quickstart-users/#login-to-archer2","title":"Login to ARCHER2","text":"

To log into ARCHER2 you should use the address:

ssh [userID]@login.archer2.ac.uk

The order in which you are asked for credentials depends on the system you are accessing:

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you previously logged into the ARCHER2 system before the major upgrade in May/June 2023 with your account you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Tip

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_archer2 you would use the command ssh -i keys/id_rsa_archer2 username@login.archer2.ac.uk to log in.

"},{"location":"quick-start/quickstart-users/#mfa-time-based-one-time-password","title":"MFA Time-based one-time password","text":"

Remember, you will need to use both an SSH key and Time-based one-time password to log into ARCHER2 so you will also need to set up your TOTP before you can log into ARCHER2.

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed You will not use your password when logging on to ARCHER2 after the initial logon.

Hint

More information on connecting to ARCHER2 is available in the Connecting to ARCHER2 section of the User Guide.

"},{"location":"quick-start/quickstart-users/#file-systems-and-manipulating-data","title":"File systems and manipulating data","text":"

ARCHER2 has a number of different file systems and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.

ARCHER2 file systems are:

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

Top tips for managing data on ARCHER2:

Hint

Information on the file systems and best practice in managing you data is available in the Data management and transfer section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users/#accessing-software","title":"Accessing software","text":"

Software on ARCHER2 is principally accessed through modules. These load and unload the desired applications, compilers, tools and libraries through the module command and its subcommands. Some modules will be loaded by default on login, providing a default working environment; many more will be available for use but initially unloaded, allowing you to set up the environment to suit your needs.

At any stage you can check which modules have been loaded by running

module list\n

Running the following command will display all environment modules available on ARCHER2, whether loaded or unloaded

module avail\n

The search field for this command may be narrowed by providing the first few characters of the module name being queried. For example, all available versions and variants of VASP may be found by running

module avail vasp\n

You will see that different versions are available for many modules. For example, vasp/5/5.4.4.pl2 and vasp/6/6.3.2 are two available versions of VASP on the full system. Furthermore, a default version may be specified; this is used if no version is provided by the user.

Important

VASP is licensed software, as are other software packages on ARCHER2. You must have a valid licence to use licensed software on ARCHER2. Often you will need to request access through the SAFE. More on this below.

The module load command loads a module for use. Following the above,

module load vasp/6\n

would load the default version of VASP 6, while

module load vasp/6/6.3.2\n

would specifically load version 6.3.2. A loaded module may be unloaded through the identical module remove command, e.g.

module unload vasp\n

The above unloads whichever version of VASP is currently in the environment. Rather than issuing separate unload and load commands, versions of a module may be swapped as follows:

module swap vasp vasp/5/5.4.4.pl2\n

Other helpful commands are:

Tip

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

Points to be aware of include:

More information on modules and the software environment on ARCHER2 can be found in the Software environment section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users/#requesting-access-to-licensed-software","title":"Requesting access to licensed software","text":"

Some of the software installed on ARCHER2 requires a user to have a valid licence agreed with the software owners/developers to be able to use it (for example, VASP). Although you will be able to load this software on ARCHER2, you will be barred from actually using it until your licence has been verified.

You request access to licensed software through the SAFE (the web administration tool you used to apply for your account and retrieve your initial password) by being added to the appropriate Package Group. To request access to licensed software:

  1. Log in to SAFE
  2. Go to the Menu Login accounts and select the login account which requires access to the software
  3. Click New Package Group Request
  4. Select the software from the list of available packages and click Select Package Group
  5. Fill in as much information as possible about your license; at the very least provide the information requested at the top of the screen such as the licence holder's name and contact details. If you are covered by the license because the licence holder is your supervisor, for example, please state this.
  6. Click Submit

Your request will then be processed by the ARCHER2 Service Desk who will confirm your license with the software owners/developers before enabling your access to the software on ARCHER2. This can take several days (depending on how quickly the software owners/developers take to respond) but you will be advised once this has been done.

"},{"location":"quick-start/quickstart-users/#create-a-job-submission-script","title":"Create a job submission script","text":"

To run a program on the ARCHER2 compute nodes you need to write a job submission script that tells the system how many compute nodes you want to reserve and for how long. You also need to use the srun command to launch your parallel executable.

Hint

For a more details on the Slurm scheduler on ARCHER2 and writing job submission scripts see the Running jobs on ARCHER2 section of the User and Best Practice Guide.

Important

Parallel jobs on ARCHER2 should be run from the work file systems as the home file systems are not available on the compute nodes - you will see a chdir or file not found error if you try to access data on the home file system within a parallel job running on the compute nodes.

Create a job submission script called submit.slurm in your space on the work file systems using your favourite text editor. For example, using vim:

auser@ln01:~> cd /work/t01/t01/auser\nauser@ln01:/work/t01/t01/auser> vim submit.slurm\n

Tip

You will need to use your project code and username to get to the correct directory. i.e. replace the t01 above with your project code and replace the username auser with your ARCHER2 username.

Paste the following text into your job submission script, replacing ENTER_YOUR_BUDGET_CODE_HERE with your budget code e.g. e99-ham, ENTER_PARTITION_HERE with the partition you wish to run on (e.g standard), and ENTER_QOS_HERE with the quality of service you want (e.g. standard).

Full system
#!/bin/bash --login\n\n#SBATCH --job-name=test_job\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:5:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module to get access to the xthi program\nmodule load xthi\n\n# Recommended environment settings\n# Stop unintentional multi-threading within software libraries\nexport OMP_NUM_THREADS=1\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# srun launches the parallel program based on the SBATCH options\nsrun --distribution=block:block --hint=nomultithread xthi_mpi\n
"},{"location":"quick-start/quickstart-users/#submit-your-job-to-the-queue","title":"Submit your job to the queue","text":"

You submit your job to the queues using the sbatch command:

auser@ln01:/work/t01/t01/auser> sbatch submit.slurm\nSubmitted batch job 23996\n\nThe value returned is your *Job ID*.\n
"},{"location":"quick-start/quickstart-users/#monitoring-your-job","title":"Monitoring your job","text":"

You use the squeue command to examine jobs in the queue. To list all the jobs you have in the queue, use:

auser@ln01:/work/t01/t01/auser> squeue -u $USER\n

squeue on its own lists all jobs in the queue from all users.

"},{"location":"quick-start/quickstart-users/#checking-the-output-from-the-job","title":"Checking the output from the job","text":"

The job submission script above should write the output to a file called slurm-<jobID>.out (i.e. if the Job ID was 23996, the file would be slurm-23996.out), you can check the contents of this file with the cat command. If the job was successful you should see output that looks something like:

auser@ln01:/work/t01/t01/auser> cat slurm-23996.out\nNode    0, hostname nid001020\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    0, rank    8, thread   0, (affinity =    8)\nNode    0, rank    9, thread   0, (affinity =    9)\nNode    0, rank   10, thread   0, (affinity =   10)\nNode    0, rank   11, thread   0, (affinity =   11)\nNode    0, rank   12, thread   0, (affinity =   12)\nNode    0, rank   13, thread   0, (affinity =   13)\nNode    0, rank   14, thread   0, (affinity =   14)\nNode    0, rank   15, thread   0, (affinity =   15)\nNode    0, rank   16, thread   0, (affinity =   16)\nNode    0, rank   17, thread   0, (affinity =   17)\nNode    0, rank   18, thread   0, (affinity =   18)\nNode    0, rank   19, thread   0, (affinity =   19)\nNode    0, rank   20, thread   0, (affinity =   20)\nNode    0, rank   21, thread   0, (affinity =   21)\n... output trimmed ...\n

If something has gone wrong, you will find any error messages in the file instead of the expected output.

"},{"location":"quick-start/quickstart-users/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

You should use the following phrase to acknowledge ARCHER2 for all research outputs that were generated using the ARCHER2 service:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"quick-start/quickstart-users/#useful-links","title":"Useful Links","text":"

If you plan to compile your own programs on ARCHER2, you may also want to look at Quickstart for developers.

Other documentation you may find useful:

"},{"location":"research-software/","title":"Research Software","text":"

ARCHER2 provides a number of research software packages as centrally supported packages. Many of these packages are free to use, but others require a license (which you, or your research group, need to supply).

This section also contains information on research software contributed and/or supported by third parties (marked with a * in the list below).

For centrally supported packages, the version available will usually be the current stable release, to include major releases and significant updates. We will usually not maintain older versions and versions no longer supported by the developers of the package.

The following sections provide details on access to each of the centrally installed packages (software that is not part of the fully-supported software stack are marked with *):

"},{"location":"research-software/#not-on-the-list","title":"Not on the list?","text":"

If the software you are interested in is not in the above list, we may still be able to help you install your own version, either individually, or as a project. Please contact the Service Desk.

"},{"location":"research-software/casino/","title":"CASINO","text":"

Note

CASINO is not available as central install/module on ARCHER2 at this time. This page provides tips on using CASINO on ARCHER2 for users who have obtained their own copy of the code.

Important

CASINO is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

CASINO is a computer program system for performing quantum Monte Carlo (QMC) electronic structure calculations that has been developed by a group of researchers initially working in the Theory of Condensed Matter group in the Cambridge University physics department, and their collaborators, over more than 20 years. It is capable of calculating incredibly accurate solutions to the Schr\u00f6dinger equation of quantum mechanics for realistic systems built from atoms.

"},{"location":"research-software/casino/#useful-links","title":"Useful Links","text":""},{"location":"research-software/casino/#compiling-casino-on-archer2","title":"Compiling CASINO on ARCHER2","text":"

You should use the linuxpc-gcc-slurm-parallel.archer2 configuration that is supplied along with the CASINO source code to build on ARCHER2 and ensure that you build the \"Shm\" (System-V shared memory) version of the code.

Bug

The linuxpc-cray-slurm-parallel.archer2 configuration produces a binary that crashes with a segfault and should not be used.

"},{"location":"research-software/casino/#using-casino-on-archer2","title":"Using CASINO on ARCHER2","text":"

The performance of CASINO on ARCHER2 is critically dependent on three things:

Next, we show how to make sure that the MPI transport layer is set to UCX, how to set the number of cores sharing the System-V shared memory segments and how to pin MPI processes sequentially to cores.

Finally, we provide a job submission script that demonstrates all these options together.

"},{"location":"research-software/casino/#setting-the-mpi-transport-layer-to-ucx","title":"Setting the MPI transport layer to UCX","text":"

In your job submission script that runs CASINO you switch to using UCX as the MPI transport layer by including the following lines before you run CASINO (i.e. before the srun command that launches the CASINO executable):

module load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n
"},{"location":"research-software/casino/#setting-the-number-of-cores-sharing-memory","title":"Setting the number of cores sharing memory","text":"

In your job submission script you set the number of cores sharing memory segments by setting the CASINO_NUMABLK environment variable before you run CASINO. For example, to specify that there should be shared memory segments each shared between 16 cores, you would use:

export CASINO_NUMABLK=16\n

Tip

If you do not set CASINO_NUMABLK then CASINO will use the default of all cores on a node (the equivalent of setting it to 128) which will give very poor performance so you should always set this environment variable. Setting CASINO_NUMABLK to 8 or 16 cores gives the best performance. 32 cores is acceptable if you want to maximise memory efficiency. Using 64 and 128 gives poor performance.

"},{"location":"research-software/casino/#pinning-mpi-processes-sequentially-to-cores","title":"Pinning MPI processes sequentially to cores","text":"

For shared memory segments to work efficiently MPI processes must be pinned sequentially to cores on compute nodes (so that cores sharing memory are close in the node memory hierarchy). To do this, you add the following options to the srun command in your job script that runs the CASINO executable:

--distribution=block:block --hint=nomultithread\n
"},{"location":"research-software/casino/#example-casino-job-submission-script","title":"Example CASINO job submission script","text":"

The following script will run a CASINO job using 16 nodes (2048 cores).

#!/bin/bash\n\n# Request 16 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=CASINO\n#SBATCH --nodes=16\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure we are using UCX as the MPI transport layer\nmodule load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n\n# Set CASINO to share memory across 16 core blocks\nexport CASINO_NUMABLK=16\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the location of the CASINO executable - this must be on /work\n#   Replace this with the path to your compiled CASINO binary\nCASINO_EXE=/work/t01/t01/auser/CASINO/bin_qmc/linuxpc-gcc-slurm-parallel.archer2/Shm/opt/casino\n\n# Launch CASINO with MPI processes pinned to cores in a sequential order\nsrun --distribution=block:block --hint=nomultithread ${CASINO_EXE}\n
"},{"location":"research-software/casino/#casino-performance-on-archer2","title":"CASINO performance on ARCHER2","text":"

We have run the benzene_dimer benchmark on ARCHER2 with the following configuration:

Timings are reported as time taken for 100 equilibration steps in DMC calculation.

"},{"location":"research-software/casino/#casino_numablk8","title":"CASINO_NUMABLK=8","text":"Nodes Time taken (s) Speedup 1 289.90 1.0 2 154.93 1.9 4 81.06 3.6 8 41.44 7.0 16 23.16 12.5"},{"location":"research-software/castep/","title":"CASTEP","text":"

CASTEP is a leading code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of properties of materials proprieties including energetics, structure at the atomic level, vibrational properties, electronic response properties etc. In particular it has a wide range of spectroscopic features that link directly to experiment, such as infra-red and Raman spectroscopies, NMR, and core level spectra.

"},{"location":"research-software/castep/#useful-links","title":"Useful Links","text":""},{"location":"research-software/castep/#using-castep-on-archer2","title":"Using CASTEP on ARCHER2","text":"

CASTEP is only available to users who have a valid CASTEP licence.

If you have a CASTEP licence and wish to have access to CASTEP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/castep/#note-on-using-relativistic-j-dependent-pseudopotentials","title":"Note on using Relativistic J-dependent pseudopotentials","text":"

These pseudopotentials cannot be generated on the fly by CASTEP and so are available in the following directory on ARCHER2:

/work/y07/shared/apps/core/castep/pseudopotentials\n
"},{"location":"research-software/castep/#running-parallel-castep-jobs","title":"Running parallel CASTEP jobs","text":"

The following script will run a CASTEP job using 2 nodes (256 cores). it assumes that the input files have the file stem text_calc.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=CASTEP\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Load the CASTEP module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load castep\nexport OMP_NUM_THREADS=1\nsrun --distribution=block:block --hint=nomultithread castep.mpi test_calc\n
"},{"location":"research-software/castep/#using-serial-castep-tools","title":"Using serial CASTEP tools","text":"

Serial CASTEP tools are available in the standard CASTEP module.

"},{"location":"research-software/castep/#compiling-castep","title":"Compiling CASTEP","text":"

The latest instructions for building CASTEP on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/cesm-further-examples/","title":"Further Examples CESM 2.1.3","text":"

In the process of porting CESM 2.1.3 to ARCHER2, a set of 4 long runs were carried out. This page contains the four example cases which have been validated with longer runs. They vary in the numbers of cores or threads used, but included here are the PE layouts used in these validation runs, which can be used as a guide for other runs. While only these four compsets and grids have been validated, CESM2 is not bound to just these cases. Links to the UCAR/NCAR pages on configurations, compsets and grids are in the useful links section of the CESM2.1.3 on ARCHER2 page, which can be used to find many of the defined compsets for CESM2.1.3.

"},{"location":"research-software/cesm-further-examples/#atmosphere-only-f2000climo","title":"Atmosphere-only / F2000climo","text":"

This compset uses the F09 grid which is roughly equivalent to a 1 degree resolution. On ARCHER2 with four nodes this configuration should give a throughput of around 7.8 simulated years per (wallclock) day (SYPD). The commands to set up and run the case are as follows:

${CIMEROOT}/scripts/create_newcase --case [case name] --compset F2000climo --res f09_f09_mg17 --walltime [enough time] --project [project code]\ncd [case directory]\n./xmlchange NTASKS=512,NTASKS_ESP=1\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#slab-ocean-etest","title":"Slab Ocean / ETEST","text":"

The slab ocean case is similar to the atmosphere-only case in terms of resources needed, as the slab ocean is inexpensive to simulate in comparison to the atmosphere. The setup detailed below uses two OMP threads, and more tasks than were used by the F2000climo case, and so a throughput of around 20 SYPD can be expected. Unlike F2000climo, but like most compsets, this is unsupported (meaning it has not been scientifically verified by NCAR personnel) and as such an extra argument is required when creating the case. The arguments for ROOTPE are to guard against poor decisions being automatically chosen with respect to resources.

${CIMEROOT}/scripts/create_newcase --case [case name] --compset ETEST --res f09_g17 --walltime [enough time] --project [project code] --run-unsupported\ncd [case directory]\n./xmlchange NTASKS=1024,NTASKS_ESP=1\n./xmlchange NTHRDS=2\n./xmlchange ROOTPE_ICE=0,ROOTPE_OCN=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#coupled-ocean-b1850","title":"Coupled Ocean / B1850","text":"

Compsets with the B prefix are fully coupled, and actively simulate all components. As such, This case is more expensive to run, most especially the ocean component. This case can be set up to run on dedicated nodes by changing the $ROOTPE variables (run the ./pelayout command to check that you have things as you wish). This should give a throughput of just over 10 SYPD.

${CIMEROOT}/scripts/create_newcase --case [case name] --compset B1850 --res f09_g17 --walltime [enough time] --project [project name]\ncd [case directory]\n./xmlchange NTASKS_CPL=1024,NTASKS_ICE=256,NTASKS_LND=256,NTASKS_GLC=128,NTASKS_ROF=128,NTASKS_WAV=256,NTASKS_OCN=512,NTASKS_ATM=1024\n./xmlchange ROOTPE_CPL=0,ROOTPE_ICE=0,ROOTPE_LND=256,ROOTPE_GLC=512,ROOTPE_ROF=640,ROOTPE_WAV=768,ROOTPE_OCN=1024,ROOTPE_ATM=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n

You can also define the PE layout in terms of full nodes by using negative values. As such, for a $MAX_MPITASKS_PER_NODE=128 and $MAX_TASKS_PER_NODE=128, the below is equivalent to the above:

${CIMEROOT}/scripts/create_newcase --case [case name] --compset B1850 --res f09_g17 --walltime [enough time] --project [project name]\ncd [case directory]\n./xmlchange NTASKS_CPL=-8,NTASKS_ICE=-2,NTASKS_LND=-2,NTASKS_GLC=-1,NTASKS_ROF=-1,NTASKS_WAV=-2,NTASKS_OCN=-4,NTASKS_ATM=-8\n./xmlchange ROOTPE_CPL=0,ROOTPE_ICE=0,ROOTPE_LND=-2,ROOTPE_GLC=-4,ROOTPE_ROF=-5,ROOTPE_WAV=-6,ROOTPE_OCN=-8,ROOTPE_ATM=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#waccm-x-fxhist","title":"WACCM-X / FXHIST","text":"

The WACCM-X case needs care during the set up and running for a couple of reasons. Firstly, as mentioned in the known issues section on archiving errors the short-term archiver can sometimes move too many files and thus create problems with resubmissions. Secondly, it can pick up other files in the cesm_inputdata directory, causing issues when running. WACCM-X is also comparatively very expensive, and so only has an expected throughput of a little over 1.5 SYPD, and that when on a coarser grid than above. The setup for running a WACCM-X case with approximately 2 degree resolution and no short-term archiving is

${CIMEROOT}/scripts/create_newcase --case [case name] --compset FXHIST --res f19_f19_mg16 --walltime [enough time] --project [project name] --run-unsupported\ncd [case directory]\n./xmlchange NTASKS=512,NTASKS_ESP=1\n./xmlchange NTHRDS=2\n./xmlchange DOUT_S=FALSE\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm/","title":"Community Earth System Model (CESM2)","text":"

CESM2 is a fully-coupled, community, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states. It has seven different components: atmosphere, ocean, river run off, sea ice, land ice, waves and adaptive river transport.

Important

CESM is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/cesm/#cesm-213","title":"CESM 2.1.3","text":"

At the time of writing, CESM 2.1.3 is the latest scientifically verified version of the model.

"},{"location":"research-software/cesm/#setting-up-cesm-213-on-archer2","title":"Setting up CESM 2.1.3 on ARCHER2","text":"

Due to the nature of CESM2, there is not a centrally installed version of the program available on ARCHER2. Instead, users download their own copy of the program and make use of ARCHER2-specific configurations that have been rigorously tested.

The setup process has been streamlined on ARCHER2 and can be carried out by following the instructions on the ARCHER2 CESM2.1.3 setup page

"},{"location":"research-software/cesm/#using-cesm-213-on-archer2","title":"Using CESM 2.1.3 on ARCHER2","text":"

A quickstart guide for running a simple coupled case of CESM 2.1.3 on ARCHER2 can be found here. It should be noted that this is only a quickstart guide with a focus on the way that CESM 2.1.3 should be run specifically on ARCHER2, and is not intended to replace the larger CESM or CIME documentation linked to below.

"},{"location":"research-software/cesm/#useful-links","title":"Useful Links","text":""},{"location":"research-software/cesm/#documentation","title":"Documentation","text":"

If this is your first time running CESM2, it is highly recommended that you consult both the CIME documentation and the NCAR CESM pages for the version used in CESM 2.1.3, paying particular attention to the pages on Basic Usage of CIME which gives detailed description of the basic commands needed to get a model running.

"},{"location":"research-software/cesm/#compsets-and-configurations","title":"Compsets and Configurations","text":"

CESM2 allows simulations to be carried out using a very wide range of configurations. If you are new to CESM2 it is highly recommended that, unless you are running a case you are already familiar with, you consult the CESM2.1 Configurations page. You can also see a list of the defined compsets already available on the component set definitions page. More information about configurations, grids and compsets can be found on the CESM2 Configurations and Grids page, which includes links to the configuration settings of the different components.

"},{"location":"research-software/cesm213_run/","title":"Quick Start: CESM Model Workflow (CESM 2.1.3)","text":"

This is the procedure for quickly setting up and running a simple CESM2 case on ARCHER2. This document is based on the general quickstart guide for CESM 2.1, with modifications to give instructions specific to ARCHER2. For more expansive instructions on running CESM 2.1, please consult the NCAR CESM pages

Before following these instructions, ensure you have completed the setup procedure (see Setting up CESM2 on ARCHER2).

For your target case, the first step is to select a component set, and a resolution for your case. For the purposes of this guide, we will be looking at a simple coupled case using the B1850 compset and the f19_g17 resolution.

The current configuration of CESM 2.1.3 on ARCHER2 has been validated with the F2000 (atmosphere only), ETEST (slab ocean), B1850 (fully coupled) and FX2000 (WACCM-X) compsets. Instructions for these are here: CESM2.1.3 further examples.

Details of available component sets and resolutions are available from the query_config tool located in the my_cesm_sandbox/cime/scripts directory

cd my_cesm_sandbox/cime/scripts\n./query_config --help\n

See the supported component sets, supported model resolutions and supported machines for a complete list of CESM2 supported component sets, grids and computational platforms.

Note: Variables presented as $VAR in this guide typically refer to variables in XML files in a CESM case. From within a case directory, you can determine the value of such a variable with ./xmlquery VAR. In some instances, $VAR refers to a shell variable or some other variable; we try to make these exceptions clear.

"},{"location":"research-software/cesm213_run/#preparing-a-case","title":"Preparing a case","text":"

There are three stages to preparing the case: create, setup and build. Here you can find information on each of these steps

"},{"location":"research-software/cesm213_run/#1-create-a-case","title":"1. Create a case","text":"

The create_newcase command creates a case directory containing the scripts and XML files to configure a case (see below) for the requested resolution, component set, and machine. create_newcase has three required arguments: --case, --compset and --res (invoke create_newcase --help for help).

On machines where a project or account code is needed (including ARCHER2), you must either specify the --project argument to create_newcase or set the $PROJECT variable in your shell environment.

If running on a supported machine, that machine will normally be recognized automatically and therefore it is not required to specify the --machine argument to create_newcase. For CESM 2.1.3, ARCHER2 is classed as an unsupported machine, however the configurations for ARCHER2 are included in the version of cime downloaded in the setup process, and so adding the --machine flag should not be necessary.

Invoke create_newcase as follows:

./create_newcase --case CASENAME --compset COMPSET --res GRID --project PROJECT\n

where:

Here is an example on ARCHER2 with the CESM2 module loaded:

$CIMEROOT/scripts/create_newcase --case $CESM_ROOT/runs/b.e20.B1850.f19_g17.test --compset B1850 --res f19_g17 --project n02\n
"},{"location":"research-software/cesm213_run/#2-setting-up-the-case-run-script","title":"2. Setting up the case run script","text":"

Issuing the case.setup command creates scripts needed to run the model along with namelist user_nl_xxx files, where xxx denotes the set of components for the given case configuration. Before invoking case.setup, modify the env_mach_pes.xml file in the case directory using the xmlchange command as needed for the experiment.

cd to the case directory. Following the example from above:

cd $CESM_ROOT/runs/b.e20.B1850.f19_g17.test\n

Invoke the case.setup command.

./case.setup\n

If any changes are made to the case, case.setup can be re-run using

./case.setup --reset\n
"},{"location":"research-software/cesm213_run/#3-build-the-executable-using-the-casebuild-command","title":"3. Build the executable using the case.build command","text":"

Run the build script.

./case.build\n

This build may take a while to run, and have periods where the build process doesn't seem to be doing anything. You should only cancel the build if there has been no activity by the build script after 15 minutes.

The CESM executable will appear in the directory given by the XML variable $EXEROOT, which can be queried using:

./xmlquery EXEROOT\n

by default, this will be the bld directory in your case directory.

If any changes are made to xml parameters that would necessitate rebuilding (see the Making Changes section below), then you can apply these by running

./case.setup --reset\n./case.build --clean-all\n./case.build\n
"},{"location":"research-software/cesm213_run/#input-data","title":"Input Data","text":"

Each case of CESM will require input data, which is downloaded from UCAR servers. Input data from similar compsets is often reused, so running two similar cases may not require downloading any additional input data for the second case.

You can check to see if the required input data is already in your input data directory using

./check_input_data\n

If it is not present you can download the input data for the case prior to running the case using

./check_input_data --download\n

This can be useful for cases where a large amount of data is needed, as you can write a simple slurm script to run this download on the serial queue. Information on creating job submission scripts can be found on the ARCHER2 page on Running Jobs.

Downloading the case input data at this stage is optional, and if skipped the data will be downloaded using the login node when you run the case.submit script. This may cause the case.submit script to take a long time to download.

An important thing to note is that your input data will be stored in your /work area, and will contribute to your storage allocation. These input files can sometimes take up a large amount of space, and so it is recommended that you do not keep any input data that is no longer needed.

"},{"location":"research-software/cesm213_run/#making-changes-to-a-case","title":"Making changes to a case","text":"

After creating a new case, the CIME functions can be used to make changes to the case setup, such as changing the wallclock time, number of cores etc.

You can query settings using the xmlquery script from your case directory:

./xmlquery <name_of_setting>\n

Adding the -p flag allows you to look up partial names, for example

$ ./xmlquery -p JOB\n\nOutput:\nResults in group case.run\n        JOB_QUEUE: standard\n        JOB_WALLCLOCK_TIME: 01:30:00\n\nResults in group case.st_archive\n        JOB_QUEUE: short\n        JOB_WALLCLOCK_TIME: 0:20:00\n

Here all parameters that match the JOB pattern are returned. It is worth noting that the parameters JOB_QUEUE and JOB_WALLCLOCK_TIME are present for both the case.run job and the case.st_archive job. To view just one of these, you can use the --subgroup flag:

$ ./xmlquery -p JOB --subgroup case.run\n\nOutput:\nResults in group case.run\n        JOB_QUEUE: standard\n        JOB_WALLCLOCK_TIME: 01:30:00\n

When you know which setting you want to change, you can do so using the xmlchange command

./xmlchange <name_of_setting>=<new_value>\n

For example to change the wallclock time for the case.run job to 30 minutes, without knowing the exact name, you could do

$ ./xmlquery -p WALLCLOCK\n\nOutput:\nResults in group case.run\n        JOB_WALLCLOCK_TIME: 24:00:00\n\nResults in group case.st_archive\n        JOB_WALLCLOCK_TIME: 0:20:00\n\n$ ./xmlchange JOB_WALLCLOCK_TIME=00:30:00 --subgroup case.run\n\n$ ./xmlquery JOB_WALLCLOCK_TIME\n\nOutput:\nResults in group case.run\n        JOB_WALLCLOCK_TIME: 00:30:00\n\nResults in group case.st_archive\n        JOB_WALLCLOCK_TIME: 0:20:00\n

Note: If you try to set a parameter equal to a value that is not known to the program, it might suggest using a --force flag. This may be useful, for example, in the case of using a queue that has not been configured yet, but use with care!

Some changes to the case must be done before calling ./case.setup or ./case.build, otherwise the case will need to be reset or cleaned, using ./case.setup --reset and ./case.build --clean-all. These are as follows:

Many of the namelist variables can be changed just before calling ./case.submit.

"},{"location":"research-software/cesm213_run/#run-the-case","title":"Run the case","text":"

Modify runtime settings in env_run.xml (optional). At this point you may want to change the running parameters of your case, such as run length. By default, the model is set to run for 5 days based on the $STOP_N and $STOP_OPTION variables:

./xmlquery STOP_OPTION,STOP_N\n

These default settings can be useful in troubleshooting runtime problems before submitting for a longer time, but will not allow the model to run long enough to produce monthly history climatology files. In order to produce history files, increase the run length to a month or longer:

./xmlchange STOP_OPTION=nmonths,STOP_N=1\n

If you want a longer run, for example 30 years, this cannot be done in a single job as the amount of wallclock time required would be considerably longer than the maximum allowed by the ARCHER2 queue system. To do this, you would split the simulation into appropriate chunks, such as 6 chunks of 5 years (assuming a simulated years per day (SYPD) of greater than 5 - some values for SYPD on ARCHER2 are given in the further examples page). Using the $RESUBMIT xml variable and setting the values of the $STOP_OPTION and $STOP_N variables accordingly you can then chain the running of these chunks:

./xmlchange RESUBMIT=6, STOP_OPTION= nyears, and STOP_N= 5\n

This would then run 6 resubmissions, each new job picking back up where the previous job had stopped. For more information about this, see the user guide page on running a case.

Once you have set your job to run for the correct length of time, it is a good idea to check the correct amount of resource is available for the job. You can quickly check the job submission parameters by running

./preview_run\n

which will show you at a glance the wallclock times, job queues and the list of jobs to be submitted, as well as other parameters such as the number of MPI tasks, number of OpenMP threads.

Submit the job to the batch queue using the case.submit command.

./case.submit\n

The case.submit script will submit a job called .case.run, and if $DOUT_S is set to TRUE it will also submit a short-term archiving job. By default, the queue these jobs are submitted to is the standard queue. For information on the resources available on each queue, see the QOS guide.

Note: There is a small possibility that your job may initially fail with the error message ERROR: Undefined env var 'CESM_ROOT'. This could have two causes: 1. You do not have the CESM2/2.1.3 module loaded. This module needs to be loaded when running the case as well as when building the case. Try running again after having run module load CESM2/2.1.3 2. This could also be due to a known issue with ARCHER2 where adding the SBATCH directive export=ALL to a slurm script will not work (see the ARCHER2 known issues entry on the subject). The ARCHER2 configuration included in the version of cime that was downloaded during setup should apply a work-around to this, and so you should not see this error in this case. It may still occur in some corner cases however. To avoid this, ensure that the environment from which you are submitting your case has the CESM2/2.1.3 module loaded and run the case.submit script with the following command

./case.submit -a=--export=ALL\n

When the job is complete, most output will not necessarily be written under the case directory, but instead under some other directories. Review the following directories and files, whose locations can be found with xmlquery (note: xmlquery can be run with a list of comma separated names and no spaces):

./xmlquery RUNDIR,CASE,CASEROOT,DOUT_S,DOUT_S_ROOT\n
"},{"location":"research-software/cesm213_run/#monitoring-jobs","title":"Monitoring Jobs","text":"

As CESM jobs are submitted to the ARCHER2 batch system, they can be monitored in the same way as other jobs, using the command

squeue -u $USER\n

You can get more details about the batch scheduler by consulting the ARCHER2 scheduling guide.

"},{"location":"research-software/cesm213_run/#archiving","title":"Archiving","text":"

The CIME framework allows for short-term and long-term archiving of model output. This is particularly useful when the model is configured to output to a small storage space and large files may need to be moved during larger simulations. On ARCHER2, the model is configured to use short-term archiving, but not yet configured for long-term archiving.

Short-term archiving is on by default for compsets and can be toggled on and off using the DOUT_S parameter set to True or False using the xmlchange script:

./xmlchange DOUT_S=FALSE\n

When DOUT_S=TRUE, calling ./case.submit will automatically submit a \u201cst_archive\u201d job to the batch system that will be held in the queue until the main job is complete. This can be configured in the same way as the main job for a different queue, wallclock time, etc. One change that may be advisable to make would be to change the queue your st_archive job is submitted to, as archiving does not require a large amount of resources and the short and serial queues on ARCHER2 do not use your project allowance. This would be done using the xmlchange script almost the same as for the case.run job. Note that the main job and the archiving job share some parameter names such as JOB_QUEUE, and so a flag (--subgroup) specifying which you want to change should be used, as below:

./xmlchange JOB_QUEUE=short --subgroup case.st_archive\n

If the --subgroup flag is not used, then the JOB_QUEUE value for both the case.run and case.st_archive jobs will be changed. You can verify that they are different by running

./xmlquery JOB_QUEUE\n

which will show the value of this parameter for both jobs.

The archive is set up to move .nc files and logs from $CESM_ROOT/runs/$CASE to $CESM_ROOT/archive/$CASE. As such, your /work storage quota is being used whether archiving is switched on or off, and so it would be recommended that data you wish to retain be moved to another service such as a group workspace on JASMIN. See the Data Management and Transfer guide for more information on archiving data from ARCHER2. If you want to archive your files directly to a different location than the default, this can be set using the $DOUT_S_ROOT parameter.

"},{"location":"research-software/cesm213_run/#troubleshooting","title":"Troubleshooting","text":"

If a run fails, the first place to check is the run submission output file, usually located at

$CASEROOT/run.$CASE\n

so, for the example job run in this guide, the output file will be at

$CESM_ROOT/runs/b.e20.B1850.f19_g17.test/run.b.e20.B1850.f19_g17.test\n

If any errors have occurred, the location of the relevant log in which you can examine this error will be printed towards the end of this output file. The log will usually be located at

$CASEROOT/run/cesm.log.*\n

so in this case, the path would be

$CESM_ROOT/runs/b.e20.B1850.f19_g17.test/run/cesm.log.*\n
"},{"location":"research-software/cesm213_run/#known-issues-and-common-problems","title":"Known Issues and Common Problems","text":""},{"location":"research-software/cesm213_run/#input-data-errors","title":"Input data errors","text":"

Occasionally, the input data for a case is not downloaded correctly. Unfortunately, in these cases the checksum test run by the check_input_data script will not catch the corrupted fields in the file. The error message displayed can vary somewhat, but a common error message is

ERROR timeaddmonths(): MM out of range\"\n

You can often spot these errors by examining the log as described above, as the error will occur shortly after a file has been read. If this happens, delete the file in question from your cesm_inputdata directory and rerun

./check_input_data --download\n
to ensure that the data is downloaded correctly.

"},{"location":"research-software/cesm213_run/#sigfpe-errors","title":"SIGFPE errors","text":"

If running a case with the DEBUG flag enabled, you may see some SIGFPE errors. In this case, the traceback shown in the logs will show the error as originating in one of three places:

This problem is caused by 'short-circuit' logic in the affected files, where there may be a conditional of the form

if (A .and. B) then....\n
where B cannot be properly evaluated if A fails, for example

if ( x /= 0 .and. y/x > c ) then....\n
which would result in a divide-by-zero error if the second condition was evaluated after the first condition had already failed.

In standard simulations, the second condition would be skipped in these cases however if the user has set

./xmlchange DEBUG=TRUE\n

then the second condition will not be skipped and a SIGFPE error will occur.

If encountering these errors, a user can do one of two things. The simplest solution is to turn off the DEBUG flag with

./xmlchange DEBUG=TRUE\n
If this option is not possible however, and your simulation absolutely needs to be run in DEBUG mode, then the conditional can be modified in the program code. THIS IS DONE AT YOUR OWN RISK!!! The fix that has been applied for the WW3 component can be seen here. It is recommended that if you are making any changes to the code for this reason, that you revert your changes back once you no longer need to run your case in DEBUG mode.

"},{"location":"research-software/cesm213_run/#sigsegv-errors","title":"SIGSEGV errors","text":"

Sometimes an error will occur where a run is ended prematurely and gives an error of the form

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.\n

This can often be solved by increasing the amount of available memory per task, either by changing the maximum number of MPI tasks per node by using

./xmlchange MAX_TASKS_PER_NODE=64\n

or by increasing the number of threads used by using

./xmlchange NTHRDS=2\n

This will double the amount of memory available for each physical core

"},{"location":"research-software/cesm213_run/#archiving-errors","title":"Archiving Errors","text":"

When running WACCM-X cases (compsets starting FX*), there can sometimes be problems when running restart jobs. This is caused by the short-term archiving job mistakenly moving files needed for restarts to the archive. To ensure this does not happen, it can be a good idea when running WACCM-X simulations to turn off the short-term archiver using

./xmlchange DOUT_S=FALSE\n

While this behaviour has so far only been observed for WACCM-X jobs, it is possible that this behaviour can occur with other compsets

"},{"location":"research-software/cesm213_run/#job-failing-instantly-with-undefined-environment-variable","title":"Job Failing instantly with undefined environment variable","text":"

There is a small possibility that your job may initially fail with the error message

ERROR: Undefined env var 'CESM_ROOT'\n
This could have two causes: 1. You do not have the CESM2/2.1.3 module loaded. This module needs to be loaded when running the case as well as when building the case. Try running again after having run module load CESM2/2.1.3 2. This could also be due to a known issue with ARCHER2 where adding the SBATCH directive export=ALL to a slurm script will not work (see the ARCHER2 known issues entry on the subject). The ARCHER2 configuration included in the version of cime that was downloaded during setup should apply a work-around to this, and so you should not see this error in this case. It may still occur in some corner cases however. To avoid this, ensure that the environment from which you are submitting your case has the CESM2/2.1.3 module loaded and run the case.submit script with the following command
./case.submit -a=--export=ALL\n

"},{"location":"research-software/cesm213_setup/","title":"First-Time setup of CESM 2.1.3","text":"

Important

These instructions are intended for users of the n02 project. Downloads may be incomplete if you are not a member of n02.

Due to the nature of the CESM program, a centrally installed version of the code is not provided on ARCHER2. Instead, a user needs to download and set up the program themselves in their /work area. The installation is done in three steps:

  1. Download the code and set up the directory structure
  2. Link and Download Components
  3. Build CPRNC

After setup, CESM is ready to run a simple case.

"},{"location":"research-software/cesm213_setup/#downloading-cesm-213-and-setting-up-the-directory-structure","title":"Downloading CESM 2.1.3 And Setting Up The Directory Structure","text":"

For ease of use, a setup script has been created which downloads CESM 2.1.3, creates the directory structure needed for running CESM2 cases and creates a hidden file in your home directory containing environment variables needed by CESM.

To execute this script, run the following in an archer2 terminal

module load cray-python\nsource /work/n02/shared/CESM2/setup_cesm213.sh\n

This script will create a directory, defaulting to /work/$GROUP/$GROUP/$USER/cesm/CESM2.1.3, where $GROUP is your default group, for example n02, and populate it with the following subdirectories: * archive - short-term archiving for completed runs, * ccsm_baselines - baseline files, * cesm_inputdata - input data downloaded and used when running cases, * runs - location of the case files used when running a case, * cesm directory - location of the cesm source code and the various components. Defaults to my_cesm_sandbox

The default locations for the CESM root directory and the CESM location can be overridden during installation either by entering new paths at runtime when prompted or by providing them as command line arguments, for example

source /work/n02/shared/CESM2/setup_cesm213.sh -p /work/n03/n03/$USER/CESM213 -l cesm_prog\n
"},{"location":"research-software/cesm213_setup/#manual-setup-instructions","title":"Manual setup instructions","text":"

If you have trouble with running the setup script, you can install manually by running the following commands:

PREFIX=\"path/to/your/desired/cesm/root/location\"\nCESM_DIR_LOC=\"name_of_install_directory_for_cesm\"\n\nmkdir -p $PREFIX\ncd $PREFIX\nmkdir -p archive\nmkdir -p ccsm_baselines\nmkdir -p cesm_inputdata\nmkdir -p runs\n\nCESM_LOC=$PREFIX/$CESM_DIR_LOC\n\ngit clone -b release-cesm2.1.3  https://github.com/ESCOMP/CESM.git $CESM_LOC\ncd $CESM_LOC\ngit checkout release-cesm2.1.3\n\ntee ${HOME}/.cesm213 <<EOF > /dev/null\n### CESM 2.1.3 on ARCHER2 Path File\n### Do Not Edit This File Unless You Know What You Are Doing\nCIME_MODEL=cesm\nCESM_ROOT=$PREFIX\nCESM_LOC=$PREFIX/$CESM_DIR_LOC\nCIMEROOT=$PREFIX/$CESM_DIR_LOC/cime\nEOF\n\necho \"module use /work/n02/shared/CESM2/module\" >> ~/.bashrc\nmodule use /work/n02/shared/CESM2/module\nmodule load CESM2/2.1.3\n
"},{"location":"research-software/cesm213_setup/#linking-and-downloading-components","title":"Linking And Downloading Components","text":"

CESM utilises multiple components, including CAM (atmosphere), CICE (sea ice), CISM (ice sheets), CTSM (land), MOSART (adaptive river transport), POP2 (ocean), RTM (river transport) and WW3 (waves), all of which are connected using the Common Infrastructure for Modelling the Earth (CIME). These components are hosted on github, and during the setup process they are downloaded.

Before downloading the external components, you must first modify the file $CESM_LOC/Externals.cfg. This will change the version of CIME from the default cime 5.6.32 to the maintained cime 5.6 branch. This is done by modifying the file so that the cime section goes from

[cime]\ntag = cime5.6.32\nprotocol = git\nrepo_url = https://github.com/ESMCI/cime\nlocal_path = cime\nrequired = True\n

to

[cime]\nbranch = maint-5.6\nprotocol = git\nrepo_url = https://github.com/ESMCI/cime\nlocal_path = cime\nexternals = Externals_cime.cfg\nrequired = True\n

In the same $CESM_LOC/Externals.cfg file, also update the version of CAM:

[cam]\ntag = cam_cesm2_1_rel_41\nprotocol = git\nrepo_url = https://github.com/ESCOMP/CAM\nlocal_path = components/cam\nexternals = Externals_CAM.cfg\nrequired = True\n

to

[cam]\ntag = cam_cesm2_1_rel\nprotocol = git\nrepo_url = https://github.com/ESCOMP/CAM\nlocal_path = components/cam\nexternals = Externals_CAM.cfg\nrequired = True\n

By making these changes, the configurations for archer2 are brought in along with some bug fixes

Once this has been done you are free to download the external components by executing the commands

cd $CESM_LOC\n./manage_externals/checkout_externals\n

The first time you run the checkout_externals script, you may be asked to accept a certificate, and you may also get an error of the form

    svn: E120108: Error running context: The server unexpectedly closed the connection.\n
If this happens, rerun the checkout_externals script and it should download the external components correctly.

"},{"location":"research-software/cesm213_setup/#building-cprnc","title":"Building cprnc","text":"

cprnc is a generic tool for analyzing a netcdf file or comparing two netcdf files. It is used in various places by CESM and the source is included with cime.

To build, execute the following commands

module load CESM2/2.1.3\ncd $CIMEROOT/tools/cprnc\n../configure --macros-format=Makefile --mpilib=mpi-serial\nsed -i '/}}/d' .env_mach_specific.sh\nsource ./.env_mach_specific.sh \nmake\n

It is likely you will see a warning message of the form

The following dependent module(s) are not currently loaded: cray-hdf5-parallel (required by: CESM2/2.1.3), cray-netcdf-hdf5parallel (required by: CESM2/2.1.3), cray-parallel-netcdf (required by: CESM2/2.1.3)\n

This is due to serial netCDF and hdf5 libraries being loaded as a result of the --mpilib=mpi-serial flag. This warning message is safe to ignore.

In a small number of cases you may also see a warning of the form

-bash: export: '}}': not a valid identifier\n

This warning should also be safe to ignore, but can be solved by opening the file ./.env_mach_specific.sh in a text editor and commenting out or deleting the line

export OMP_NUM_THREADS={{ thread_count }}\n

Then rerunning the command

source ./.env_mach_specific.sh && make\n

Once this step has been completed, you are ready to run a simple test case.

"},{"location":"research-software/chemshell/","title":"ChemShell","text":"

ChemShell is a script-based chemistry code focusing on hybrid QM/MM calculations with support for standard quantum chemical or force field calculations. There are two versions: an older Tcl-based version Tcl-ChemShell and a more recent python-based version Py-ChemShell.

The advice from https://www.chemshell.org/licence on the difference is:

We consider Py-ChemShell 23.0 to be suitable for production calculations on both materials systems and biomolecules, and recommend that new ChemShell users should use the Python-based version.

We continue to maintain the original Tcl-based version of ChemShell and distribute it on request. Tcl-ChemShell currently contains some features that are not yet available in Py-ChemShell (but will be soon!) including a QM/MM MD driver and multiple electronic state calculations. At the present time if you need this functionality you will need to obtain a licence for Tcl-Chemshell.

"},{"location":"research-software/chemshell/#useful-links","title":"Useful Links","text":""},{"location":"research-software/chemshell/#using-py-chemshell-on-archer2","title":"Using Py-ChemShell on ARCHER2","text":"

The python-based version of ChemShell is open-source and is freely available to all users on ARCHER2. The version of Py-ChemShell pre-installed on ARCHER2 is compiled with NWChem and GULP as libraries.

Warning

Py-ChemShell on ARCHER2 is compiled with GULP 6.0. This is a licenced software that is free to use for academics. If you are not an academic user (or if you are using Py-ChemShell for non-academic work), please ensure that you have the correct GULP licence before using GULP functionalities in py-ChemShell or make sure that you are not using any of the GULP functionalities in your code (i.e., do not set theory=GULP in your calculations).

"},{"location":"research-software/chemshell/#running-parallel-py-chemshell-jobs","title":"Running parallel Py-ChemShell jobs","text":"

Unlike most other ARCHER2 software packages, the Py-ChemShell module is built in such a way as to enable users to create and submit jobs to the compute nodes by running a chemsh script from the login node rather than by creating and submitting a Slurm submission script. Below is an example command for submitting a pure MPI Py-ChemShell job running on 8 nodes (128x8 cores) with the chemsh command:

    # Run this from the login node\n    module load py-chemshell\n\n    # Replace [budget code] below with your project code (e.g. t01)\n    chemsh --submit               \\\n           --jobname pychmsh      \\\n           --account [budget code] \\\n           --partition standard   \\\n           --qos standard         \\\n           --walltime 0:10:0      \\\n           --nnodes 8             \\\n           --nprocs 1024          \\ \n           py-chemshell-job.py\n
"},{"location":"research-software/chemshell/#using-tcl-chemshell-on-archer2","title":"Using Tcl-ChemShell on ARCHER2","text":"

The older version of Tcl-based ChemShell requires a license. Users with a valid license should request access via the ARCHER2 SAFE.

"},{"location":"research-software/chemshell/#running-parallel-tcl-chemshell-jobs","title":"Running parallel Tcl-ChemShell jobs","text":"

The following script will run a pure MPI Tcl-based ChemShell job using 8 nodes (128x8 cores).

#!/bin/bash\n\n#SBATCH --job-name=lammps_test\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load tcl-chemshell/3.7.1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread chemsh.x input.chm\n
"},{"location":"research-software/code-saturne/","title":"Code_Saturne","text":"

Code_Saturne solves the Navier-Stokes equations for 2D, 2D-axisymmetric and 3D flows, steady or unsteady, laminar or turbulent, incompressible or weakly dilatable, isothermal or not, with scalar transport if required. Several turbulence models are available, from Reynolds-averaged models to large-eddy simulation (LES) models. In addition, a number of specific physical models are also available as \"modules\": gas, coal and heavy-fuel oil combustion, semi-transparent radiative transfer, particle-tracking with Lagrangian modeling, Joule effect, electrics arcs, weakly compressible flows, atmospheric flows, rotor/stator interaction for hydraulic machines.

"},{"location":"research-software/code-saturne/#useful-links","title":"Useful Links","text":""},{"location":"research-software/code-saturne/#using-code_saturne-on-archer2","title":"Using Code_Saturne on ARCHER2","text":"

Code_Saturne is released under the GNU General Public Licence v2 and so is freely available to all users on ARCHER2.

You can load the default GCC build of Code_Saturne for use by running the following command:

module load code_saturne\n

This will load the default code_saturne/7.0.1-gcc11 module. A build using the CCE compilers, code_saturne/7.0.1-cce12, has also been made optionally available to users on the full ARCHER2 system as testing indicates that this may provide improved performance over the GCC build.

"},{"location":"research-software/code-saturne/#running-parallel-code_saturne-jobs","title":"Running parallel Code_Saturne jobs","text":"

After setting up a case it should be initialized by running the following command from the case directory, where setup.xml is the input file:

code_saturne run --initialize --param setup.xml\n

This will create a directory named for the current date and time (e.g. 20201019-1636) inside the RESU directory. Inside the new directory will be a script named run_solver. You may alter this to resemble the script below, or you may wish to simply create a new one with the contents shown.

If you wish to alter the existing run_solver script you will need to add all the #SBATCH options shown to set the job name, size and so on. You should also add the two module commands, and srun --distribution=block:block --hint=nomultithread as well as the --mpi option to the line executing ./cs_solver to ensure parallel execution on the compute nodes. The export LD_LIBRARY_PATH=... and cd commands are redundant and may be retained or removed.

This script will run an MPI-only Code_Saturne job using the default GCC build and UCX over 4 nodes (128 x 4 = 512 cores) for a maximum of 20 minutes.

#!/bin/bash\n#SBATCH --export=none\n#SBATCH --job-name=CSExample\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the GCC build of Code_Saturne 7.0.1\nmodule load cpe/21.09\nmodule load PrgEnv-gnu\nmodule load code_saturne\n\n# Switch to mpich-ucx implementation (see info note below)\nmodule swap craype-network-ofi craype-network-ucx\nmodule swap cray-mpich cray-mpich-ucx\n\n# Prevent threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Run solver.\nsrun --distribution=block:block --hint=nomultithread ./cs_solver --mpi $@\n

The script can then be submitted to the batch system with sbatch.

Info

There is a known issue with the default MPI collectives which is causing performance issues on Code_Saturne. The suggested workaround is to switch to the mpich-ucx implementation. For this to link correctly on the full system, the extra cpe/21.09 and PrgEnv-gnu modules also have to be explicitly loaded.

"},{"location":"research-software/code-saturne/#compiling-code_saturne","title":"Compiling Code_Saturne","text":"

The latest instructions for building Code_Saturne on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/cp2k/","title":"CP2K","text":"

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modelling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO), and classical force fields (AMBER, CHARMM). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.

"},{"location":"research-software/cp2k/#useful-links","title":"Useful links","text":""},{"location":"research-software/cp2k/#using-cp2k-on-archer2","title":"Using CP2K on ARCHER2","text":"

CP2K is available through the cp2k module. MPI only cp2k.popt and MPI/OpenMP Hybrid cp2k.psmp binaries are available.

For ARCHER2, CP2K has been compiled with the following optional features: FFTW for fast Fourier transforms, libint to enable methods including Hartree-Fock exchange, libxc to provide a wider choice of exchange-correlation functionals, ELPA for improved performance of matrix diagonalisation, PLUMED to allow enhanced sampling methods.

See CP2K compile instructions for a full list of optional features.

If there is an optional feature not available, and which you would like, please contact the Service Desk. Experts may also wish to compile their own versions of the code (see below for instructions).

"},{"location":"research-software/cp2k/#running-parallel-cp2k-jobs","title":"Running parallel CP2K jobs","text":""},{"location":"research-software/cp2k/#mpi-only-jobs","title":"MPI only jobs","text":"

To run CP2K using MPI only, load the cp2k module and use the cp2k.psmp executable.

For example, the following script will run a CP2K job using 4 nodes (128x4 cores):

#!/bin/bash\n\n# Request 4 nodes using 128 cores per node for 128 MPI tasks per node.\n\n#SBATCH --job-name=CP2K_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevent CP2K module\nmodule load cp2k\n\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block cp2k.psmp -i MYINPUT.inp\n
"},{"location":"research-software/cp2k/#mpiopenmp-hybrid-jobs","title":"MPI/OpenMP hybrid jobs","text":"

To run CP2K using MPI and OpenMP, load the cp2k module and use the cp2k.psmp executable.

#!/bin/bash\n\n# Request 4 nodes with 16 MPI tasks per node each using 8 threads;\n# note this means 128 MPI tasks in total.\n# Remember to replace [budget code] below with your account code,\n# e.g. '--account=t01'.\n\n#SBATCH --job-name=CP2K_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=8\n#SBATCH --time=00:20:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant CP2K module\nmodule load cp2k\n\n# Ensure OMP_NUM_THREADS is consistent with cpus-per-task above\nexport OMP_NUM_THREADS=8\nexport OMP_PLACES=cores\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block cp2k.psmp -i MYINPUT.inp\n
"},{"location":"research-software/cp2k/#compiling-cp2k","title":"Compiling CP2K","text":"

The latest instructions for building CP2K on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/crystal/","title":"CRYSTAL","text":"

CRYSTAL is a general-purpose program for the study of crystalline solids. The CRYSTAL program computes the electronic structure of periodic systems within Hartree Fock, density functional or various hybrid approximations (global, range-separated and double-hybrids). The Bloch functions of the periodic systems are expanded as linear combinations of atom centred Gaussian functions. Powerful screening techniques are used to exploit real space locality. Restricted (Closed Shell) and Unrestricted (Spin-polarized) calculations can be performed with all-electron and valence-only basis sets with effective core pseudo-potentials. The current release is CRYSTAL23.

Important

CRYSTAL is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/crystal/#useful-links","title":"Useful Links","text":""},{"location":"research-software/crystal/#using-crystal-on-archer2","title":"Using CRYSTAL on ARCHER2","text":"

CRYSTAL is only available to users who have a valid CRYSTAL license. You request access through SAFE:

Please have your license details to hand.

"},{"location":"research-software/crystal/#running-parallel-crystal-jobs","title":"Running parallel CRYSTAL jobs","text":"

The following script will run CRYSTAL using pure MPI for parallelisation using 256 MPI processes, 1 per core across 2 nodes. It assumes that the input file is tio2.d12

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --time=0:20:00\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load crystal/23-1.0.1-2\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Change this to the name of your input file\ncp tio2.d12 INPUT\n\nsrun --hint=nomultithread --distribution=block:block MPPcrystal\n

An equivalent 2 node job using MPI+OpenMP parallelism with 4 threads per MPI process, 64 MPI processes, 1 thread per core across 2 nodes would be:

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --time=0:20:00\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=4\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load crystal/23-1.0.1-2\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Change this to the name of your input file\ncp tio2.d12 INPUT\n\nexport OMP_NUM_THREADS=4\nexport OMP_PLACES=cores\nexport OMP_STACKSIZE=16M\n\nsrun --hint=nomultithread --distribution=block:block MPPcrystalOMP\n
"},{"location":"research-software/crystal/#tips-and-known-issues","title":"Tips and known issues","text":""},{"location":"research-software/crystal/#cpu-frequency","title":"CPU frequency","text":"

You should run some short (1 or 2 SCF cycles) jobs to test the scaling of your job so you can decide on the balance between cost to your budget and the time it takes to get a result. You now should include a few tests at different clock rates as part of this process.

Based on a few simple tests we have run it is likely that jobs dominated by building the Kohn-Sham matrix (SHELLX+MONMO3+NUMDFT in the output) will see minimal energy savings and better performance at 2.25GHz. Jobs dominated by the ScaLapack calls (MPP_DIAG in the output) may show useful energy savings at 2.0GHz.

"},{"location":"research-software/crystal/#out-of-memory-errors","title":"Out-of-memory errors","text":"

Long-running jobs may encounter unexpected errors of the form

slurmstepd: error: Detected 1 oom-kill event(s) in step 411502.0 cgroup.\n
These are related to a memory leak in the underlying libfabric communication layer, which will be fixed in a future release. In the meantime, it should be possible to work around the problem by adding
export FI_MR_CACHE_MAX_COUNT=0 \n
to the SLURM submission script.

"},{"location":"research-software/fhi-aims/","title":"FHI-aims","text":"

FHI-aims is an all-electron electronic structure code based on numeric atom-centered orbitals. It enables first-principles simulations with very high numerical accuracy for production calculations, with excellent scalability up to very large system sizes (thousands of atoms) and up to very large, massively parallel supercomputers (ten thousand CPU cores).

"},{"location":"research-software/fhi-aims/#useful-links","title":"Useful Links","text":""},{"location":"research-software/fhi-aims/#using-fhi-aims-on-archer2","title":"Using FHI-aims on ARCHER2","text":"

FHI-aims is only available to users who have a valid FHI-aims licence.

If you have a FHI-aims licence and wish to have access to FHI-aims on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/fhi-aims/#running-parallel-fhi-aims-jobs","title":"Running parallel FHI-aims jobs","text":"

The following script will run a FHI-aims job using 8 nodes (1024 cores). The script assumes that the input have the default names control.in and geometry.in.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=FHI-aims\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the FHI-aims module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load fhiaims\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=1\nsrun --distribution=block:block --hint=nomultithread aims.mpi.x\n
"},{"location":"research-software/fhi-aims/#compiling-fhi-aims","title":"Compiling FHI-aims","text":"

The latest instructions for building FHI-aims on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/gromacs/","title":"GROMACS","text":"

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

"},{"location":"research-software/gromacs/#useful-links","title":"Useful Links","text":""},{"location":"research-software/gromacs/#using-gromacs-on-archer2","title":"Using GROMACS on ARCHER2","text":"

GROMACS is Open Source software and is freely available to all users. Three executable versions are available on the normal (CPU-only) modules:

We also provide a GPU version of GROMACS that will run on the MI210 GPU nodes, it's named gromacs/2022.4-GPU and can be loaded with

module load gromacs/2022.4-GPU\n

Important

The gromacs modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/gromacs/#running-parallel-gromacs-jobs","title":"Running parallel GROMACS jobs","text":""},{"location":"research-software/gromacs/#running-mpi-only-jobs","title":"Running MPI only jobs","text":"

The following script will run a GROMACS MD job using 4 nodes (128x4 cores) with pure MPI.

#!/bin/bash\n\n#SBATCH --job-name=mdrun_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Setup the environment\nmodule load gromacs\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=1 \nsrun --distribution=block:block --hint=nomultithread gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"research-software/gromacs/#running-hybrid-mpiopenmp-jobs","title":"Running hybrid MPI/OpenMP jobs","text":"

The following script will run a GROMACS MD job using 4 nodes (128x4 cores) with 6 MPI processes per node (24 MPI processes in total) and 6 OpenMP threads per MPI process.

#!/bin/bash\n#SBATCH --job-name=mdrun_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=8\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Setup the environment\nmodule load gromacs\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=8\nsrun --distribution=block:block --hint=nomultithread gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"research-software/gromacs/#running-gromacs-on-the-amd-mi210-gpus","title":"Running GROMACS on the AMD MI210 GPUs","text":"

The following script will run a GROMACS MD job using 1 GPU with 1 MPI process 8 OpenMP threads per MPI process.

#!/bin/bash\n#SBATCH --job-name=mdrun_gpu\n#SBATCH --gpus=1\n#SBATCH --time=00:20:00\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd  # or gpu-exc\n\n# Setup the environment\nmodule load gromacs/2022.4-GPU\n\nexport OMP_NUM_THREADS=8\nsrun --ntasks=1 --cpus-per-task=8 gmx_mpi mdrun -ntomp 8 --noconfout -s calc.tpr\n
"},{"location":"research-software/gromacs/#compiling-gromacs","title":"Compiling Gromacs","text":"

The latest instructions for building GROMACS on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/lammps/","title":"LAMMPS","text":"

LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is a classical molecular dynamics code. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers), and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, mesoscopic, or continuum scale.

"},{"location":"research-software/lammps/#useful-links","title":"Useful Links","text":""},{"location":"research-software/lammps/#using-lammps-on-archer2","title":"Using LAMMPS on ARCHER2","text":"

LAMMPS is freely available to all ARCHER2 users.

The centrally installed version of LAMMPS is compiled with all the standard packages included: ASPHERE, BODY, CLASS2, COLLOID, COMPRESS, CORESHELL, DIPOLE, GRANULAR, KSPACE, MANYBODY, MC, MISC, MOLECULE, OPT, PERI, QEQ, REPLICA, RIGID, SHOCK, SNAP, SRD.

We do not install any USER packages. If you are interested in a USER package, we would encourage you to try to compile your own version and we can help out if necessary (see below).

Important

The lammps modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/lammps/#running-parallel-lammps-jobs","title":"Running parallel LAMMPS jobs","text":"

LAMMPS can exploit multiple nodes on ARCHER2 and will generally be run in exclusive mode using more than one node.

For example, the following script will run a LAMMPS MD job using 4 nodes (128x4 cores) with MPI only.

#!/bin/bash\n\n#SBATCH --job-name=lammps_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load lammps\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread lmp -i in.test -l out.test\n
"},{"location":"research-software/lammps/#compiling-lammps","title":"Compiling LAMMPS","text":"

The large range of optional packages available for LAMMPS, and opportunity for extensibility, may mean that it is convenient for users to compile their own copy. In practice, LAMMPS is relatively easy to compile, so we encourage users to have a go.

Compilation instructions for LAMMPS on ARCHER2 can be found on GitHub:

"},{"location":"research-software/mitgcm/","title":"MITgcm","text":"

The Massachusetts Institute of Technology General Circulation Model (MITgcm) is a numerical model designed for study of the atmosphere, ocean, and climate. MITgcm's flexible non-hydrostatic formulation enables it to simulate fluid phenomena over a wide range of scales; its adjoint capabilities enable it to be applied to sensitivity questions and to parameter and state estimation problems. By employing fluid equation isomorphisms, a single dynamical kernel can be used to simulate flow of both the atmosphere and ocean.

"},{"location":"research-software/mitgcm/#useful-links","title":"Useful Links","text":""},{"location":"research-software/mitgcm/#building-mitgcm-on-archer2","title":"Building MITgcm on ARCHER2","text":"

MITgcm is not available via a module on ARCHER2 as users will build their own executables specific to the problem they are working on.

You can obtain the MITgcm source code from the developers by cloning from the GitHub repository with the command

git clone https://github.com/MITgcm/MITgcm.git\n

You should then copy the ARCHER2 optfile into the MITgcm directories.

Warning

A current ARCHER2 optfile is not available at the present time. Please contact support@archer2.ac.uk for help.

You should also set the following environment variables. MITGCM_ROOTDIR is used to locate the source code and should point to the top MITgcm directory. Optionally, adding the MITgcm tools directory to your PATH environment variable makes it easier to use tools such as genmake2, and the MITGCM_OPT environment variable makes it easier to refer to pass the optfile to genmake2.

export MITGCM_ROOTDIR=/path/to/MITgcm\nexport PATH=$MITGCM_ROOTDIR/tools:$PATH\nexport MITGCM_OPT=$MITGCM_ROOTDIR/tools/build_options/dev_linux_amd64_cray_archer2\n

When using genmake2 to create the Makefile, you will need to specify the optfile to use. Other commonly used options might be to use extra source code with the -mods option, to enable MPI with -mpi, and to enable OpenMP with -omp. You might then run a command that resembles the following:

genmake2 -mods /path/to/additional/source -mpi -optfile $MITGCM_OPT\n

You can read about the full set of options available to genmake2 by running

genmake2 -help\n

Finally, you may then build your executable by running

make depend\nmake\n
"},{"location":"research-software/mitgcm/#running-mitgcm-on-archer2","title":"Running MITgcm on ARCHER2","text":""},{"location":"research-software/mitgcm/#pure-mpi","title":"Pure MPI","text":"

Once you have built your executable you can write a script like the following which will allow it to run on the ARCHER2 compute nodes. This example would run a pure MPI MITgcm simulation over 2 nodes of 128 cores each for up to one hour.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MITgcm-simulation\n#SBATCH --time=1:0:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n
"},{"location":"research-software/mitgcm/#hybrid-openmp-mpi","title":"Hybrid OpenMP & MPI","text":"

Warning

Running the model in hybrid mode may lead to performance decreases as well as increases. You should be sure to profile your code both as a pure MPI application and as a hybrid OpenMP-MPI application to ensure you are making efficient use of resources. Be sure to read both the Archer2 advice on OpenMP and the MITgcm documentation first.

Note

Early versions of the ARCHER2 MITgcm optfile do not contain an OMPFLAG. Please ensure you have an up to date copy of the optfile before attempting to compile OpenMP enabled codes.

Depending upon your model setup, you may wish to run the MITgcm code as a hybrid OpenMP-MPI application. In terms of compiling the model, this is as simple as using the flag -omp when calling genmake2, and updating your SIZE.h file to have multiple tiles per process.

The model can be run using a slurm job submission script similar to that shown below. This example will run MITgcm across 2 nodes, with each node using 16 MPI processes, and each process using 4 threads. Note that this would underpopulate the nodes \u2014 i.e. we will only be using 128 of the 256 cores available to us. This can also sometimes lead to performance increases.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MITgcm-hybrid-simulation\n#SBATCH --time=1:0:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=4\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=4  # Set to number of threads per process\nexport OMP_PLACES=\"cores(128)\"  # Set to total number of threads\nexport OMP_PROC_BIND=true  # Required if we want to underpopulate nodes\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n

One final note, is that you should remember to update the eedata file in the model's run directory to ensure the number of threads requested there match those requested in the job submission script.

"},{"location":"research-software/mitgcm/#reproducing-the-ecco-version-4-release-4-state-estimate-on-archer2","title":"Reproducing the ECCO version 4 (release 4) state estimate on ARCHER2","text":"

The ECCO version 4 state estimate (ECCOv4-r4) is an observationally-constrained numerical solution produced by the ECCO group at JPL. If you would like to reproduce the state estimate on ARCHER2 in order to create customised runs and experiments, follow the instructions below. They have been slightly modified from the JPL instructions for ARCHER2.

For more information, see the ECCOv4-r4 website https://ecco-group.org/products-ECCO-V4r4.htm

"},{"location":"research-software/mitgcm/#get-the-eccov4-r4-source-code","title":"Get the ECCOv4-r4 source code","text":"

First, navigate to your directory on the /work filesystem in order to get access to the compute nodes. Next, create a working directory, perhaps MYECCO, and navigate into this working directory:

mkdir MYECCO\ncd MYECCO\n

In order to reproduce ECCOv4-r4, we need a specific checkpoint of the MITgcm source code.

git clone https://github.com/MITgcm/MITgcm.git -b checkpoint66g\n

Next, get the ECCOv4-r4 specific code from GitHub:

cd MITgcm\nmkdir -p ECCOV4/release4\ncd ECCOV4/release4\ngit clone https://github.com/ECCO-GROUP/ECCO-v4-Configurations.git\nmv ECCO-v4-Configurations/ECCOv4\\ Release\\ 4/code .\nrm -rf ECCO-v4-Configurations\n
"},{"location":"research-software/mitgcm/#get-the-eccov4-r4-forcing-files","title":"Get the ECCOv4-r4 forcing files","text":"

The surface forcing and other input files that are too large to be stored on GitHub are available via NASA data servers. In total, these files are about 200 GB in size. You must register for an Earthdata account and connect to a WebDAV server in order to access these files. For more detailed instructions, read the help page https://ecco.jpl.nasa.gov/drive/help.

First, apply for an Earthdata account: https://urs.earthdata.nasa.gov/users/new

Next, acquire your WebDAV credentials: https://ecco.jpl.nasa.gov/drive (second box from the top)

Now, you can use wget to download the required forcing and input files:

wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_forcing\nwget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init\nwget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_ecco\n

After using wget, you will notice that the input* directories are, by default, several levels deep in the directory structure. Use the mv command to move the input* directories to the directory where you executed the wget command. Specifically,

mv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_forcing/ .\nmv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init/ .\nmv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_ecco/ .\nrm -rf ecco.jpl.nasa.gov\n
"},{"location":"research-software/mitgcm/#compiling-and-running-eccov4-r4","title":"Compiling and running ECCOv4-r4","text":"

The steps for building the ECCOv4-r4 instance of MITgcm are very similar to those for other build cases. First, wou will need to create a build directory:

cd MITgcm/ECCOV4/release4\nmkdir build\ncd build\n

Load the NetCDF modules:

module load cray-hdf5\nmodule load cray-netcdf\n

If you haven't already, set your environment variables:

export MITGCM_ROOTDIR=../../../../MITgcm\nexport PATH=$MITGCM_ROOTDIR/tools:$PATH\nexport MITGCM_OPT=$MITGCM_ROOTDIR/tools/build_options/dev_linux_amd64_cray_archer2\n

Next, compile the executable:

genmake2 -mods ../code -mpi -optfile $MITGCM_OPT\nmake depend\nmake\n

Once you have compiled the model, you will have the mitgcmuv executable for ECCOv4-r4.

"},{"location":"research-software/mitgcm/#create-run-directory-and-link-files","title":"Create run directory and link files","text":"

In order to run the model, you need to create a run directory and link/copy the appropriate files. First, navigate to your directory on the work filesystem. From the MITgcm/ECCOV4/release4 directory:

mkdir run\ncd run\n\n# link the data files\nln -s ../input_init/NAMELIST/* .\nln -s ../input_init/error_weight/ctrl_weight/* .\nln -s ../input_init/error_weight/data_error/* .\nln -s ../input_init/* .\nln -s ../input_init/tools/* .\nln -s ../input_ecco/*/* .\nln -s ../input_forcing/eccov4r4* .\n\npython mkdir_subdir_diags.py\n\n# manually copy the mitgcmuv executable\ncp -p ../build/mitgcmuv .\n

For a short test run, edit the nTimeSteps variable in the file data. Comment out the default value and uncomment the line reading nTimeSteps=8. This is a useful test to make sure that the model can at least start up.

To run on ARCHER2, submit a batch script to the Slurm scheduler. Here is an example submission script:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=ECCOv4r4-test\n#SBATCH --time=1:0:0\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=12\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# For adjoint runs the default cpu-freq is a lot slower\n#SBATCH --cpu-freq=2250000\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n

This configuration uses 96 MPI processes at 12 MPI processes per node. Once the run has finished, in order to check that the run has successfully completed, check the end of one of the standard output files.

tail STDOUT.0000\n

It should read

PROGRAM MAIN: Execution ended Normally\n

The files named STDOUT.* contain diagnostic information that you can use to check your results. As a first pass, check the printed statistics for any clear signs of trouble (e.g. NaN values, extremely large values).

"},{"location":"research-software/mitgcm/#eccov4-r4-in-adjoint-mode","title":"ECCOv4-r4 in adjoint mode","text":"

If you have access to the commercial TAF software produced by http://FastOpt.de, then you can compile and run the ECCOv4-r4 instance of MITgcm in adjoint mode. This mode is useful for comprehensive sensitivity studies and for constructing state estimates. From the MITgcm/ECCOV4/release4 directory, create a new code directory and a new build directory:

mkdir code_ad\ncd code_ad\nln -s ../code/* .\ncd ..\nmkdir build_ad\ncd build_ad\n

In this instance, the code_ad and code directories are identical, although this does not have to be the case. Make sure that you have the staf script in your path or in the build_ad directory itself. To make sure that you have the most up-to-date script, run:

./staf -get staf\n

To test your connection to the FastOpt servers, try:

./staf -test\n

You should receive the following message:

Your access to the TAF server is enabled.\n

The compilation commands are similar to those used to build the forward case.

# load relevant modules\nmodule load cray-netcdf-hdf5parallel\nmodule load cray-hdf5-parallel\n\n# compile adjoint model\n../../../MITgcm/tools/genmake2 -ieee -mpi -mods=../code_ad -of=(PATH_TO_OPTFILE)\nmake depend\nmake adtaf\nmake adall\n

The source code will be packaged and forwarded to the FastOpt servers, where it will undergo source-to-source translation via the TAF algorithmic differentiation software. If the compilation is successful, you will have an executable named mitgcmuv_ad. This will run the ECCOv4-r4 configuration of MITgcm in adjoint mode. As before, create a run directory and copy in the relevant files. The procedure is the same as for the forward model, with the following modifications:

cd ..\nmkdir run_ad\ncd run_ad\n# manually copy the mitgcmuv executable\ncp -p ../build_ad/mitgcmuv_ad .\n

To run the model, change the name of the executable in the Slurm submission script; everything else should be the same as in the forward case. As above, at the end of the run you should have a set of STDOUT.* files that you can examine for any obvious problems.

"},{"location":"research-software/mitgcm/#compile-time-errors","title":"Compile time errors","text":"

If TAF compilation fails with an error like failed to convert GOTPCREL relocation; relink with --no-relax then add the following line to the FFLAGS options: -Wl,--no-relax.

"},{"location":"research-software/mitgcm/#checkpointing-for-adjoint-runs","title":"Checkpointing for adjoint runs","text":"

In an adjoint run, there is a balance between storage (i.e. saving the model state to disk) and recomputation (i.e. integrating the model forward from a stored state). Changing the nchklev parameters in the tamc.h file at compile time is how you control the relative balance between storage and recomputation.

A suggested strategy that has been used on a variety of HPC platforms is as follows: 1. Set nchklev_1 as large as possible, up to the size allowed by memory on your machine. (Use the size command to estimate the memory per process. This should be just a little bit less than the maximum allowed on the machine. On ARCHER2 this is 2 GB (standard) and 4 GB (high memory)). 2. Next, set nchklev_2 and nchklev_3 to be large enough to accommodate the entire run. A common strategy is to set nchklev_2 = nchklev_3 = sqrt(numsteps/nchklev_1) + 1. 3. If the nchklev_2 files get too big, then you may have to add a fourth level (i.e. nchklev_4), but this is unlikely.

This strategy allows you to keep as much in memory as possible, minimising the I/O requirements for the disk. This is useful, as I/O is often the bottleneck for MITgcm runs on HPC.

Another way to adjust performance is to adjust how tapelevel I/O is handled. This strategy performs well for most configurations:

C o tape settings\n#define ALLOW_AUTODIFF_WHTAPEIO\n#define AUTODIFF_USE_OLDSTORE_2D\n#define AUTODIFF_USE_OLDSTORE_3D\n#define EXCLUDE_WHIO_GLOBUFF_2D\n#define ALLOW_INIT_WHTAPEIO\n

"},{"location":"research-software/mo-unified-model/","title":"Met Office Unified Model","text":"

The Met Office Unified Model (\"the UM\") is a numerical model of the atmosphere used for both weather and climate applications. It is often coupled to the NEMO ocean model using the OASIS coupling framework to provide a full Earth system model.

"},{"location":"research-software/mo-unified-model/#useful-links","title":"Useful Links","text":""},{"location":"research-software/mo-unified-model/#using-the-um","title":"Using the UM","text":"

Information on using the UM is provided by the NCAS Computational Modelling Service (CMS).

"},{"location":"research-software/namd/","title":"NAMD","text":"

NAMD is an award-winning parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

"},{"location":"research-software/namd/#useful-links","title":"Useful Links","text":""},{"location":"research-software/namd/#using-namd-on-archer2","title":"Using NAMD on ARCHER2","text":"

NAMD is freely available to all ARCHER2 users.

ARCHER2 has two versions of NAMD available: no-SMP (namd/2.14-nosmp) or SMP (namd/2.14). The SMP (Shared Memory Parallelism) build of NAMD introduces threaded parallelism to address memory limitations. The no-SMP build will typically provide the best performance but most users will require SMP in order to cope with high memory requirements.

Important

The namd modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/namd/#running-mpi-only-namd-jobs","title":"Running MPI only NAMD jobs","text":"

Using no-SMP NAMD will run jobs with only MPI processes and will not introduce additional threaded parallelism. This is the simplest approach to running NAMD jobs and is likely to give the best performance unless simulations are limited by high memory requirements.

The following script will run a pure MPI NAMD MD job using 4 nodes (i.e. 128x4 = 512 MPI parallel processes).

#!/bin/bash\n\n# Request four nodes to run a job of 512 MPI tasks with 128 MPI\n# tasks per node, here for maximum time 20 minutes.\n\n#SBATCH --job-name=namd-nosmp\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14-nosmp\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread namd2 input.namd\n
"},{"location":"research-software/namd/#running-smp-namd-jobs","title":"Running SMP NAMD jobs","text":"

If your jobs runs out of memory, then using the SMP version of NAMD will reduce the memory requirements. This involves launching a combination of MPI processes for communication and worker threads which perform computation.

The following script will run a SMP NAMD MD job using 4 nodes with 8 MPI communication processes per node and 16 worker threads per communication process (i.e. a fully-occupied node with all 512 cores populated with processes).

#!/bin/bash\n#SBATCH --job-name=namd-smp\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=4\n#SBATCH --nodes=4\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant modules\nmodule load namd\n\n# Set procs per node (PPN) & OMP_NUM_THREADS\nexport PPN=$(($SLURM_CPUS_PER_TASK-1))\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport OMP_PLACES=cores\n\n# Record PPN in the output file\necho \"Number of worker threads PPN = $PPN\"\n\n# Run NAMD\nsrun --distribution=block:block --hint=nomultithread namd2 +setcpuaffinity +ppn $PPN input.namd\n

Important

Please do not set SRUN_CPUS_PER_TASK when running the SMP version of NAMD. Otherwise, Charm++ will be unable to pin processes to CPUs, causing NAMD to abort with errors such as Couldn't bind to cpuset 0x00000010,,,0x0: Invalid argument.

How do I choose an optimal choice of MPI processes and worker threads for my simulations? The optimal choice for the numbers of MPI processes and worker threads per node depends on the data set and the number of compute nodes. Before running large production jobs, it is worth experimenting with these parameters to find the optimal configuration for your simulation.

We recommend that users match the ARCHER2 NUMA architecture to find the optimal balance of thread and process parallelism. The NUMA levels on ARCHER2 compute nodes are: 4 cores per CCX, 8 cores per CCD, 16 cores per memory controller, 64 cores per socket. For example, the above submission script specifies 32 MPI communication processes per node and 4 worker threads per communication process which places 1 MPI process per CCX on each node.

Note

To ensure fully occupied nodes with the SMP build of NAMD and match the NUMA layout, the optimal values of (tasks-per-node, cpus-per-task) are likely to be (32,4), (16,8) or (8,16).

How do I choose a value for the +ppn flag? The number of workers per communication process is specified by the +ppn argument to NAMD, which is set here to equal cpus-per-task - 1, to leave a CPU-core free for the associated MPI process.

We recommend that users reserve a thread per process to improve the scalability. Reserving this thread on a many-cores-per-node architecture like ARCHER2 will reduce the communication between threads and improve the scalability.

"},{"location":"research-software/namd/#compiling-namd","title":"Compiling NAMD","text":"

The latest instructions for building NAMD on ARCHER2 may be found in the GitHub repository of build instructions.

ARCHER2 Full System

"},{"location":"research-software/nektarplusplus/","title":"Nektar++","text":"

Nektar++ is a tensor product based finite element package designed to allow one to construct efficient classical low polynomial order h-type solvers (where h is the size of the finite element) as well as higher p-order piecewise polynomial order solvers.

The Nektar++ framework comes with a number of solvers and also allows one to construct a variety of new solvers. Users can therefore use Nektar++ just to run simulations, or to extend and/or develop new functionality.

"},{"location":"research-software/nektarplusplus/#useful-links","title":"Useful Links","text":""},{"location":"research-software/nektarplusplus/#using-nektar-on-archer2","title":"Using Nektar++ on ARCHER2","text":"

Nektar++ is released under an MIT license and is available to all users on the ARCHER2 full system.

"},{"location":"research-software/nektarplusplus/#where-can-i-get-help","title":"Where can I get help?","text":"

Specific issues with Nektar++ itself might be submitted to the issue tracker at the Nektar++ gitlab repository (see link above). More general questions might also be directed to the Nektar-users mailing list. Issues specific to the use or behaviour of Nektar++ on ARCHER2 should be sent to the Service Desk.

"},{"location":"research-software/nektarplusplus/#running-parallel-nektar-jobs","title":"Running parallel Nektar++ jobs","text":"

Below is the submission script for running the Taylor-Green Vortex, one of the Nektar++ tutorials, see https://doc.nektar.info/tutorials/latest/incns/taylor-green-vortex/incns-taylor-green-vortex.html#incns-taylor-green-vortexch4.html .

You first need to download the archive linked on the tutorial page.

cd /path/to/work/dir\nwget https://doc.nektar.info/tutorials/latest/incns/taylor-green-vortex/incns-taylor-green-vortex.tar.gz\ntar -xvzf incns-taylor-green-vortex.tar.gz\n
#!/bin/bash\n#SBATCH --job-name=nektar\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load nektar\n\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nNEK_INPUT_PATH=/path/to/work/dir/incns-taylor-green-vortex/completed/solver64\n\nsrun --distribution=block:cyclic --hint=nomultithread \\\n    ${NEK_DIR}/bin/IncNavierStokesSolver \\\n        ${NEK_INPUT_PATH}/TGV64_mesh.xml \\\n        ${NEK_INPUT_PATH}/TGV64_conditions.xml\n
"},{"location":"research-software/nektarplusplus/#compiling-nektar","title":"Compiling Nektar++","text":"

Instructions for building Nektar++ on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/nektarplusplus/#more-information","title":"More information","text":"

The Nektar++ team have themselves also provided detailed instructions on the build process, updated following the mid-2023 system update, on the Nektar++ website:

This page also provides instructions on how to run jobs using your local installation.

"},{"location":"research-software/nemo/","title":"NEMO","text":"

NEMO (Nucleus for European Modelling of the Ocean) is a state-of-the-art framework for research activities and forecasting services in ocean and climate sciences, developed in a sustainable way by a European consortium.

"},{"location":"research-software/nemo/#useful-links","title":"Useful Links","text":"

NEMO is released under a CeCILL license and is freely available to all users on ARCHER2.

"},{"location":"research-software/nemo/#compiling-nemo","title":"Compiling NEMO","text":"

A central install of NEMO is not appropriate for most users of ARCHER2 since many configurations will want to add bespoke code changes.

The latest instructions for building NEMO on ARCHER2 are found in the Github repository of build instructions:

"},{"location":"research-software/nemo/#using-nemo-on-archer2","title":"Using NEMO on ARCHER2","text":"

Typical NEMO production runs perform significant I/O management to handle the very large volumes of data associated with ocean modelling. To address this, NEMO ocean clients are interfaced with XIOS I/O servers. XIOS is a library which manages NetCDF outputs for climate models. NEMO uses XIOS to simplify the I/O management and introduce dedicated processors to manage large volumes of data.

Users can choose to run NEMO in attached or detached mode: - In attached mode each processor acts as an ocean client and I/O-server process. - In detached mode ocean clients and external XIOS I/O-server processors are separately defined.

Running NEMO in attached mode can be done with a simple submission script specifying both the NEMO and XIOS executable to srun. However, typical production runs of NEMO will perform significant I/O management and will be unable to run in attached mode.

Detached mode introduces external XIOS I/O-servers to help manage the large volumes of data. This requires users to specify the placement of clients and servers on different cores throughout the node using the \u2013cpu-bind=map_cpu:<cpu map> srun option to define a CPU map or mask. It is tedious to construct these maps by hand. Instead, Andrew Coward provides a tool to aid users in the construction submission scripts:

/work/n01/shared/nemo/mkslurm_hetjob\n/work/n01/shared/nemo/mkslurm_hetjob_Gnu\n

Usage of the script:

usage: mkslurm_hetjob [-h] [-S S] [-s S] [-m M] [-C C] [-g G] [-N N] [-t T]\n                      [-a A] [-j J] [-v]\n\nPython version of mkslurm_alt by Andrew Coward using HetJob. Server placement\nand spacing remains as mkslurm but clients are always tightly packed with a\ngap left every \"NC_GAP\" cores where NC_GAP can be given by the -g argument.\nvalues of 4, 8 or 16 are recommended.\n\noptional arguments:\n  -h, --help  show this help message and exit\n  -S S        num_servers (default: 4)\n  -s S        server_spacing (default: 8)\n  -m M        max_servers_per_node (default: 2)\n  -C C        num_clients (default: 28)\n  -g G        client_gap_interval (default: 4)\n  -N N        ncores_per_node (default: 128)\n  -t T        time_limit (default: 00:10:00)\n  -a A        account (default: n01)\n  -j J        job_name (default: nemo_test)\n  -v          show human readable hetjobs (default: False)\n

Note

We recommend that you retain your own copy of this script as it is not directly provided by the ARCHER2 CSE team and subject to change. Once obtained, you can set your own defaults for options in the script.

For example, to run with 4 XIOS I/O-servers (a maximum of 2 per node), each with sole occupancy of a 16-core NUMA region and 96 ocean cores, spaced with a idle core in between each, use:

./mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 > myscript.slurm\n\nINFO:root:Running mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 -N 128 -t 00:10:00 -a n01 -j nemo_test -v False\nINFO:root:nodes needed= 2 (256)\nINFO:root:cores to be used= 100 (256)\n

This has reported that 2 nodes are needed with 100 active cores spread over 256 cores. This will also have produced a submission script \"myscript.slurm\":

#!/bin/bash\n#SBATCH --job-name=nemo_test\n#SBATCH --time=00:10:00\n#SBATCH --account=n01\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-core=1\n\n# Created by: mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 -N 128 -t 00:10:00 -a n01 -j nemo_test -v False\nmodule swap craype-network-ofi craype-network-ucx\nmodule swap cray-mpich cray-mpich-ucx\nmodule load cray-hdf5-parallel/1.12.0.7\nmodule load cray-netcdf-hdf5parallel/4.7.4.7\nexport OMP_NUM_THREADS=1\n\ncat > myscript_wrapper.sh << EOFB\n#!/bin/ksh\n#\nset -A map ./xios_server.exe ./nemo\nexec_map=( 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )\n#\nexec \\${map[\\${exec_map[\\$SLURM_PROCID]}]}\n##\nEOFB\nchmod u+x ./myscript_wrapper.sh\n\nsrun --mem-bind=local \\\n--ntasks=100 --ntasks-per-node=50 --cpu-bind=v,mask_cpu:0x1,0x10000,0x100000000,0x400000000,0x1000000000,0x4000000000,0x10000000000,0x40000000000,0x100000000000,0x400000000000,0x1000000000000,0x4000000000000,0x10000000000000,0x40000000000000,0x100000000000000,0x400000000000000,0x1000000000000000,0x4000000000000000,0x10000000000000000,0x40000000000000000,0x100000000000000000,0x400000000000000000,0x1000000000000000000,0x4000000000000000000,0x10000000000000000000,0x40000000000000000000,0x100000000000000000000,0x400000000000000000000,0x1000000000000000000000,0x4000000000000000000000,0x10000000000000000000000,0x40000000000000000000000,0x100000000000000000000000,0x400000000000000000000000,0x1000000000000000000000000,0x4000000000000000000000000,0x10000000000000000000000000,0x40000000000000000000000000,0x100000000000000000000000000,0x400000000000000000000000000,0x1000000000000000000000000000,0x4000000000000000000000000000,0x10000000000000000000000000000,0x40000000000000000000000000000,0x100000000000000000000000000000,0x400000000000000000000000000000,0x1000000000000000000000000000000,0x4000000000000000000000000000000,0x10000000000000000000000000000000,0x40000000000000000000000000000000 ./myscript_wrapper.sh\n

Submitting this script in a directory with the nemo and xios_server.exe executables will run the desired MPMD job. The exec_map array shows the position of each executable in the rank list (0 = xios_server.exe, 1 = nemo). For larger core counts the cpu_map can be limited to a single node map which will be cycled through as many times as necessary.

"},{"location":"research-software/nemo/#how-to-optimise-the-performance-of-nemo","title":"How to optimise the performance of NEMO","text":"

Note

Our optimisation advice is based on the ARCHER2 4-cabinet preview system with the same node architecture as the current ARCHER2 service but a total of 1,024 compute nodes. During these investigations we used NEMO-4.0.6 and XIOS-2.5.

Through testing with idealised test cases to optimise the computational performance (i.e. without the demanding I/O management that is typical of NEMO production runs), we have found that drastically under-populating the nodes does not affect the performance of the computation. This indicates that users can reserve large portions of the nodes without a performance detriment. Users can run larger simulations by reserving up to 75% of the node can be reserved for I/O management (i.e. XIOS I/O-servers).

XIOS I/O-servers can be more lightly packed than ocean clients and should be evenly distributed amongst the nodes i.e. not concentrated on a specific node. We found that placing 1 XIOS I/O-server per node with 4, 8, and 16 dedicated cores did not affect the performance. However, the performance was affected when allocating dedicated I/O-server cores outside of a 16-core NUMA region. Thus, users should confine XIOS I/O-servers to NUMA regions to improve performance and benefit from the memory hierarchy.

"},{"location":"research-software/nemo/#a-performance-investigation","title":"A performance investigation","text":"

Note

These results were collated during early user testing of the ARCHER2 service by Andrew Coward and is subject to change.

This table shows some preliminary results of a repeated 60 day simulation of the ORCA2_ICE_PISCES, SETTE configuration using various core counts and packing strategies:

Note

These results used the mkslurm script, now hosted in /work/n01/shared/nemo/old_scripts/mkslurm

It is clear from the previous results that fully populating an ARCHER2 node is unlikely to provide the optimal performance for any codes with moderate memory bandwidth requirements. The explored regular packing strategy does not allow experimentation with less wasteful packing strategies than half-population though.

There may be a case, for example, for just leaving every 1 in 4 cores idle, or every 1 in 8, or even fewer idle cores per node. The mkslurm_alt script (/work/n01/shared/nemo/old_scripts/mkslurm_alt) provided a method of generating cpu-bind maps for exploring these strategies. The script assumed no change in the packing strategy for the servers but the core spacing argument (-c) for the ocean cores is replaced by a -g option representing the frequency of a gap in the, otherwise tightly-packed, ocean cores.

Preliminary tests have been conducted with the ORCA2_ICE_PISCES SETTE test case. This is a relatively small test case that will fit onto a single node. It is also small enough to perform well in attached mode. First some baseline tests in attached mode.

Previous tests used 4 I/O servers each occupying a single NUMA. For this size model, 2 servers occupying half a NUMA each will suffice. That leaves 112 cores with which to try different packing strategies. Is it possible to match or better this elapsed time on a single node including external I/O servers? -Yes! -but not with an obvious gap frequency:

And activating land suppression can reduce times further:

The optimal two-node solution is also shown (this is quicker but the one node solution is cheaper).

This leads us to the current iteration of the mkslurm script - mkslurm_hetjob. Note a tightly-packed placement with no gaps amongst the ocean processes can be generated using a client gap interval greater than the number of clients. This script has been used to explore the different placement strategies with a larger configuration based on eORCA025. In all cases, 8 XIOS servers were used, each with sole occupancy of a 16-core NUMA and a maximum of 2 servers per node. The rest of the initial 4 nodes (and any subsequent ocean core-only nodes) were filled with ocean cores at various packing densities (from tightly packed to half-populated). A summary of the results are shown below.

The limit of scalability for this problem size lies around 1500 cores. One interesting aspect is that the cost, in terms of node hours, remains fairly flat up to a thousand processes and the choice of gap placement makes much less difference as the individual domains shrink. It looks as if, so long as you avoid inappropriately high numbers of processors, choosing the wrong placement won't waste your allocation but may waste your time.

"},{"location":"research-software/nwchem/","title":"NWChem","text":"

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters. The NWChem software can handle: biomolecules, nanostructures, and solid-state system; from quantum to classical, and all combinations; Gaussian basis functions or plane-waves; scaling from one to thousands of processors; properties and relativity.

"},{"location":"research-software/nwchem/#useful-links","title":"Useful Links","text":""},{"location":"research-software/nwchem/#using-nwchem-on-archer2","title":"Using NWChem on ARCHER2","text":"

NWChem is released under an Educational Community License (ECL 2.0) and is freely available to all users on ARCHER2.

"},{"location":"research-software/nwchem/#where-can-i-get-help","title":"Where can I get help?","text":"

If you have problems accessing or running NWChem on ARCHER2, please contact the Service Desk. General questions on the use of NWChem might also be directed to the [NWChem forum][1]. More experienced users with detailed technical issues on NWChem should consider submitting them to the NWChem GitHub issue tracker.

"},{"location":"research-software/nwchem/#running-nwchem-jobs","title":"Running NWChem jobs","text":"

The following script will run a NWChem job using 2 nodes (256 cores) in the standard partition. It assumes that the input file is called test_calc.nw.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n\n#SBATCH --job-name=NWChem_test\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the NWChem module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load nwchem\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread nwchem test_calc\n
"},{"location":"research-software/nwchem/#compiling-nwchem","title":"Compiling NWChem","text":"

The latest instructions for building NWChem on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/onetep/","title":"ONETEP","text":"

ONETEP (Order-N Electronic Total Energy Package) is a linear-scaling code for quantum-mechanical calculations based on density-functional theory.

"},{"location":"research-software/onetep/#useful-links","title":"Useful Links","text":""},{"location":"research-software/onetep/#using-onetep-on-archer2","title":"Using ONETEP on ARCHER2","text":"

ONETEP is only available to users who have a valid ONETEP licence.

If you have a ONETEP licence and wish to have access to ONETEP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/onetep/#running-parallel-onetep-jobs","title":"Running parallel ONETEP jobs","text":"

The following script, supplied by the ONETEP developers, will run a ONETEP job using 2 nodes (256 cores) with 16 MPI processes per node and 8 OpenMP threads per MPI process. It assumes that there is a single calculation options file with the .dat extension in the working directory.

#!/bin/bash\n\n# --------------------------------------------------------------------------\n# A SLURM submission script for ONETEP on ARCHER2 (full 23-cabinet system).\n# Central install, Cray compiler version.\n# Supports hybrid (MPI/OMP) parallelism.\n#\n# 2022.06 Jacek Dziedzic, J.Dziedzic@soton.ac.uk\n#                         University of Southampton\n#         Lennart Gundelach, L.Gundelach@soton.ac.uk\n#                            University of Southampton\n#         Tom Demeyere, T.Demeyere@soton.ac.uk\n#                       University of Southampton\n# --------------------------------------------------------------------------\n\n# v1.00 (2022.06.04) jd: Adapted from the user-compiled Cray compiler version.\n\n# ==========================================================================================================\n# Edit the following lines to your liking.\n#\n#SBATCH --job-name=mine               # Name of the job.\n#SBATCH --nodes=2                     # Number of nodes in job.\n#SBATCH --ntasks-per-node=16          # Number of MPI processes per node.\n#SBATCH --cpus-per-task=8             # Number of OMP threads spawned from each MPI process.\n#SBATCH --time=5:00:00                # Max time for your job (hh:mm:ss).\n#SBATCH --partition=standard          # Partition: standard memory CPU nodes with AMD EPYC 7742 64-core processor\n#SBATCH --account=t01                 # Replace 't01' with your budget code.\n#SBATCH --qos=standard                # Requested Quality of Service (QoS), See ARCHER2 documentation\n\nexport OMP_NUM_THREADS=8              # Repeat the value from 'cpus-per-task' here.\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set up the job environment, loading the ONETEP module.\n# The module automatically sets OMP_PLACES, OMP_PROC_BIND and FI_MR_CACHE_MAX_COUNT.\n# To use a different binary, replace this line with either (drop the leading '#')\n# module load onetep/6.1.9.0-GCC-LibSci\n# to use the GCC-libsci binary, or with\n# module load onetep/6.1.9.0-GCC-MKL\n# to use the GCC-MKL binary.\n\nmodule load onetep/6.1.9.0-CCE-LibSci\n\n# ==========================================================================================================\n# !!! You should not need to modify anything below this line.\n# ==========================================================================================================\n\nworkdir=`pwd`\necho \"--- This is the submission script, the time is `date`.\"\n\n# Figure out ONETEP executable\nonetep_exe=`which onetep.archer2`\necho \"--- ONETEP executable is $onetep_exe.\"\n\nonetep_launcher=`echo $onetep_exe | sed -r \"s/onetep.archer2/onetep_launcher/\"`\n\necho \"--- workdir is '$workdir'.\"\necho \"--- onetep_launcher is '$onetep_launcher'.\"\n\n# Ensure exactly 1 .dat file in there.\nndats=`ls -l *dat | wc -l`\n\nif [ \"$ndats\" == \"0\" ]; then\n  echo \"!!! There is no .dat file in the current directory. Aborting.\" >&2\n  touch \"%NO_DAT_FILE\"\n  exit 2\nfi\n\nif [ \"$ndats\" == \"1\" ]; then\n  true\nelse\n  echo \"!!! More than one .dat file in the current directory, that's too many. Aborting.\" >&2\n  touch \"%MORE_THAN_ONE_DAT_FILE\"\n  exit 3\nfi\n\nrootname=`echo *.dat | sed -r \"s/\\.dat\\$//\"`\nrootname_dat=$rootname\".dat\"\nrootname_out=$rootname\".out\"\nrootname_err=$rootname\".err\"\n\necho \"--- The input file is $rootname_dat, the output goes to $rootname_out and errors go to $rootname_err.\"\n\n# Ensure ONETEP executable is there and is indeed executable.\nif [ ! -x \"$onetep_exe\" ]; then\n  echo \"!!! $onetep_exe does not exist or is not executable. Aborting!\" >&2\n  touch \"%ONETEP_EXE_MISSING\"\n  exit 4\nfi\n\n# Ensure onetep_launcher is there and is indeed executable.\nif [ ! -x \"$onetep_launcher\" ]; then\n  echo \"!!! $onetep_launcher does not exist or is not executable. Aborting!\" >&2\n  touch \"%ONETEP_LAUNCHER_MISSING\"\n  exit 5\nfi\n\n# Dump the module list to a file.\nmodule list >\\$modules_loaded 2>&1\n\nldd $onetep_exe >\\$ldd\n\n# Report details\necho \"--- Number of nodes as reported by SLURM: $SLURM_JOB_NUM_NODES.\"\necho \"--- Number of tasks as reported by SLURM: $SLURM_NTASKS.\"\necho \"--- Using this srun executable: \"`which srun`\necho \"--- Executing ONETEP via $onetep_launcher.\"\n\n\n# Actually run ONETEP\n# Additional srun options to pin one thread per physical core\n########################################################################################################################################################\nsrun --hint=nomultithread --distribution=block:block -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS $onetep_launcher -e $onetep_exe -t $OMP_NUM_THREADS $rootname_dat >$rootname_out 2>$rootname_err\n########################################################################################################################################################\n\necho \"--- srun finished at `date`.\"\n\n# Check for error conditions\nresult=$?\nif [ $result -ne 0 ]; then\n  echo \"!!! srun reported a non-zero exit code $result. Aborting!\" >&2\n  touch \"%SRUN_ERROR\"\n  exit 6\nfi\n\nif [ -r $rootname.error_message ]; then\n  echo \"!!! ONETEP left an error message file. Aborting!\" >&2\n  touch \"%ONETEP_ERROR_DETECTED\"\n  exit 7\nfi\n\ntail $rootname.out | grep completed >/dev/null 2>/dev/null\nresult=$?\nif [ $result -ne 0 ]; then\n  echo \"!!! ONETEP calculation likely did not complete. Aborting!\" >&2\n  touch \"%ONETEP_DID_NOT_COMPLETE\"\n  exit 8\nfi\n\necho \"--- Looks like everything went fine. Praise be.\"\ntouch \"%DONE\"\n\necho \"--- Finished successfully at `date`.\"\n
"},{"location":"research-software/onetep/#hints-and-tips","title":"Hints and Tips","text":"

See the information in the ONETEP documentation.

"},{"location":"research-software/onetep/#compiling-onetep","title":"Compiling ONETEP","text":"

The latest instructions for building ONETEP on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/openfoam/","title":"OpenFOAM","text":"

OpenFOAM is an open-source toolbox for computational fluid dynamics. OpenFOAM consists of generic tools to simulate complex physics for a variety of fields of interest, from fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetism and the pricing of financial options.

The core technology of OpenFOAM is a flexible set of modules written in C++. These are used to build solvers and utilities to perform pre-processing and post-processing tasks ranging from simple data manipulation to visualisation and mesh processing.

There are a number of different flavours of the OpenFOAM package with slightly different histories, and slightly different features. The two most common are distributed by openfoam.org and openfoam.com.

"},{"location":"research-software/openfoam/#useful-links","title":"Useful Links","text":""},{"location":"research-software/openfoam/#using-openfoam-on-archer2","title":"Using OpenFOAM on ARCHER2","text":"

OpenFOAM is released under a GPL v3 license and is freely available to all users on ARCHER2.

Upgrade 2023Full system
auser@ln01> module avail openfoam\n--------------- /work/y07/shared/archer2-lmod/apps/core -----------------\nopenfoam/com/v2106        openfoam/org/v9.20210903\nopenfoam/com/v2212 (D)    openfoam/org/v10.20230119 (D)\n

Note: the older versions were recompiled under PE22.12 in April 2023.

auser@ln01> module avail openfoam\n--------------- /work/y07/shared/archer2-lmod/apps/core -----------------\nopenfoam/com/v2106          openfoam/org/v9.20210903 (D)\nopenfoam/org/v8.20200901\n

Versions from openfoam.org are typically v8.0 etc and there is typically one release per year (in June; with a patch release in September). Versions from openfoam.com are e.g., v2106 (to be read as 2021 June) and there are typically two releases a year (one in June, and one in December).

To use OpenFOAM on ARCHER2 you should first load an OpenFOAM module, e.g.

user@ln01:> module load PrgEnv-gnu\nuser@ln01:> module load openfoam/com/v2106\n

(Note that the openfoam module will automatically load PrgEnv-gnu if it is not already active.) The module defines only the base installation directory via the environment variable FOAM_INSTALL_DIR. After loading the module you need to source the etc/bashrc file provided by OpenFOAM, e.g.

source ${FOAM_INSTALL_DIR}/etc/bashrc\n

You should then be able to use OpenFOAM. The above commands will also need to be added to any job/batch submission scripts you want to use to run OpenFOAM. Note that all the centrally installed versions of OpenFOAM are compiled under PrgEnv-gnu.

Note there are no default module versions specified. It is recommended to use a fully qualified module name (with the exact version, as in the example above).

"},{"location":"research-software/openfoam/#running-parallel-openfoam-jobs","title":"Running parallel OpenFOAM jobs","text":"

While it is possible to run limited OpenFOAM pre-processing and post-processing activities on the front end, we request all significant work is submitted to the queue system. Please remember that the front end is a shared resource.

A typical SLURM job submission script for OpenFOAM is given here. This would request 4 nodes to run with 128 MPI tasks per node (a total of 512 MPI tasks). Each MPI task is allocated one core (--cpus-per-task=1).

Upgrade 2023Full system
#!/bin/bash\n\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --distribution=block:block\n#SBATCH --hint=nomultithread\n#SBATCH --time=00:10:00\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Load the appropriate module and source the OpenFOAM bashrc file\n\nmodule load openfoam/org/v10.20230119\n\nsource ${FOAM_INSTALL_DIR}/etc/bashrc\n\n# Run OpenFOAM work, e.g.,\n\nsrun interFoam -parallel\n
#!/bin/bash\n\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --distribution=block:block\n#SBATCH --hint=nomultithread\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the appropriate module and source the OpenFOAM bashrc file\n\nmodule load openfoam/org/v8.20210901\n\nsource ${FOAM_INSTALL_DIR}/etc/bashrc\n\n# Run OpenFOAM work, e.g.,\n\nsrun interFoam -parallel\n
"},{"location":"research-software/openfoam/#compiling-openfoam","title":"Compiling OpenFOAM","text":"

If you want to compile your own version of OpenFOAM, instructions are available for ARCHER2 at:

"},{"location":"research-software/openfoam/#extensions-to-openfoam","title":"Extensions to OpenFOAM","text":"

Many packages extend the central OpenFOAM functionality in some way. However, there is no completely standardised way in which this works. Some packages assume they have write access to the main OpenFOAM installation. If this is the case, you must install your own version before continuing. This can be done on an individual basis, or a per-project basis using the project shared directories.

"},{"location":"research-software/openfoam/#module-version-history","title":"Module version history","text":"

The following centrally installed versions are available.

"},{"location":"research-software/openfoam/#upgrade-2023","title":"Upgrade 2023","text":""},{"location":"research-software/openfoam/#full-system","title":"Full system","text":""},{"location":"research-software/orca/","title":"ORCA","text":"

ORCA is an ab initio quantum chemistry program package that contains modern electronic structure methods including density functional theory, many-body perturbation, coupled cluster, multireference methods, and semi-empirical quantum chemistry methods. Its main field of application is larger molecules, transition metal complexes, and their spectroscopic properties. ORCA is developed in the research group of Frank Neese. The free version is available only for academic use at academic institutions.

Important

ORCA is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/orca/#useful-links","title":"Useful Links","text":""},{"location":"research-software/orca/#using-orca-on-archer2","title":"Using ORCA on ARCHER2","text":"

ORCA is available for academic use on ARCHER2 only. If you wish to use ORCA for commercial applications, you must contact the ORCA developers.

"},{"location":"research-software/orca/#running-parallel-orca-jobs","title":"Running parallel ORCA jobs","text":"

The following script will run an ORCA job on the ARCHER2 system using 256 MPI processes across 2 nodes, each MPI process will be placed on a separate physical core. It assumes that the input file is my_calc.inp

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:20:00\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load orca\n\n# Launch the ORCA calculation\n#   * You must use \"$ORCADIR/orca\" so the application has the full executable path\n#   * Do not use \"srun\" to launch parallel ORCA jobs as they use OpenMPI rather than Cray MPICH\n#   * Remember to change the name of the input file to match your file name\n$ORCADIR/orca my_calc.inp\n
"},{"location":"research-software/qchem/","title":"QChem","text":"

QChem is an ab initio quantum chemistry software package for fast and accurate simulations of molecular systems, including electronic and molecular structure, reactivities, properties, and spectra.

Important

QChem is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/qchem/#useful-links","title":"Useful Links","text":""},{"location":"research-software/qchem/#using-qchem-on-archer2","title":"Using QChem on ARCHER2","text":"

ARCHER2 has a site licence for QChem.

"},{"location":"research-software/qchem/#running-parallel-qchem-jobs","title":"Running parallel QChem jobs","text":"

Important

QChem parallelisation is only available on ARCHER2 by using multiple threads within a single compute node. Multi-process and multi-node parallelisation will not work on ARCHER2.

The following script will run QChem using 16 OpenMP threads using the input in hf3c.in.

#!/bin/bash\n#SBATCH --nodes=1\n#SBATCH --time=1:0:0\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=16\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load qchem\n\nexport OMP_PLACES=cores\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport SLURM_HINT=\"nomultithread\"\nexport SLURM_DISTRIBUTION=\"block:block\"\n\nqchem -slurm -nt $OMP_NUM_THREADS hf3c.in hf3c.out\n
"},{"location":"research-software/qe/","title":"Quantum Espresso","text":"

Quantum Espresso (QE) is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

"},{"location":"research-software/qe/#useful-links","title":"Useful Links","text":""},{"location":"research-software/qe/#using-qe-on-archer2","title":"Using QE on ARCHER2","text":"

QE is released under a GPL v2 license and is freely available to all ARCHER2 users.

"},{"location":"research-software/qe/#running-parallel-qe-jobs","title":"Running parallel QE jobs","text":"

For example, the following script will run a QE pw.x job using 4 nodes (128x4 cores).

#!/bin/bash\n\n# Request 4 nodes to run a 512 MPI task job with 128 MPI tasks per node.\n# The maximum walltime limit is set to be 20 minutes.\n\n#SBATCH --job-name=qe_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant Quantum Espresso module\nmodule load quantum_espresso\n\n#\u00a0Set number of OpenMP threads to 1 to prevent multithreading by libraries\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block pw.x < test_calc.in\n
"},{"location":"research-software/qe/#hints-and-tips","title":"Hints and tips","text":"

The QE module is set to load up the default QE-provided pseudo-potentials. If you wish to use non-default pseudo-potentials, you will need to change the ESPRESSO_PSEUDO variable to point to the directory you wish. This can be done by adding the following line after the module is loaded

export ESPRESSO_PSEUDO /path/to/pseudo_potentials\n
"},{"location":"research-software/qe/#compiling-qe","title":"Compiling QE","text":"

The latest instructions for building QE on ARCHER2 can be found in the GitHub repository of build instructions:

"},{"location":"research-software/vasp/","title":"VASP","text":"

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schr\u00f6dinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order M\u00f8ller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.

To determine the electronic ground state, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

"},{"location":"research-software/vasp/#useful-links","title":"Useful Links","text":""},{"location":"research-software/vasp/#using-vasp-on-archer2","title":"Using VASP on ARCHER2","text":"

VASP is only available to users who have a valid VASP licence.

If you have a VASP 5 or 6 licence and wish to have access to VASP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

Note

Both VASP 5 and VASP 6 are available on ARCHER2. You generally need a different licence for each of these versions.

"},{"location":"research-software/vasp/#running-parallel-vasp-jobs","title":"Running parallel VASP jobs","text":"

To access VASP you should load the appropriate vasp module in your job submission scripts.

To load the default version of VASP, you would use:

module load vasp\n

Tip

VASP 6.4.3 and above have all been compiled to include Wannier90 functionality. Older versions of VASP on ARCHER2 do not include Wannier90.

Once loaded, the executables are called:

Once the module has been loaded, you can access the LDA and PBE pseudopotentials for VASP on ARCHER2 at:

$VASP_PSPOT_DIR\n

Tip

VASP 6 can make use of OpenMP threads in addition to running with pure MPI. We will add notes on performance and use of threading in VASP as information becomes available.

Example VASP submission script

#!/bin/bash\n\n# Request 16 nodes (2048 MPI tasks at 128 tasks per node) for 20 minutes.   \n\n#SBATCH --job-name=VASP_test\n#SBATCH --nodes=16\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the VASP module\nmodule load vasp/6\n\n# Avoid any unintentional OpenMP threading by setting OMP_NUM_THREADS\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the code - the distribution and hint options are important for performance\nsrun --distribution=block:block --hint=nomultithread vasp_std\n
"},{"location":"research-software/vasp/#vasp-transition-state-tools-vtst","title":"VASP Transition State Tools (VTST)","text":"

As well as the standard VASP 5 modules, we provide versions of VASP 5 with the VASP Transition State Tools (VTST) from the University of Texas added. The VTST version adds various functionality to VASP and provides additional scripts to use with VASP. Additional functionality includes:

Full details of these methods and the provided scripts can be found on the VTST website.

On ARCHER2, the VTST version of VASP 5 can be accessed by loading the modules with VTST in the module name, for example:

module load vasp/6/6.4.1-vtst\n
"},{"location":"research-software/vasp/#compiling-vasp-on-archer2","title":"Compiling VASP on ARCHER2","text":"

If you wish to compile your own version of VASP on ARCHER2 (either VASP 5 or VASP 6) you can find information on how we compiled the central versions in the build instructions GitHub repository. See:

"},{"location":"research-software/vasp/#tips-for-using-vasp-on-archer2","title":"Tips for using VASP on ARCHER2","text":""},{"location":"research-software/vasp/#switching-mpi-transport-protocol-from-openfabrics-to-ucx","title":"Switching MPI transport protocol from OpenFabrics to UCX","text":"

The VASP modules are setup to use the OpenFabrics MPI transport protocol as testing has shown that this passes all the regression tests and gives the most reliable operation on ARCHER2. However, there may be cases where using UCX can give better performance than OpenFabrics.

If you want to try the UCX transport protocol then you can do this using by loading additional modules after you have loaded the VASP modules. For example, for VASP 6, you would use:

module load vasp/6\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n
"},{"location":"research-software/vasp/#increasing-the-cpu-frequency-and-enabling-turbo-boost","title":"Increasing the CPU frequency and enabling turbo-boost","text":"

The default CPU frequency is currently set to 2 GHz on ARCHER2. While many VASP calculations are memory or MPI bound, some calculations can be CPU bound. For those cases, you may see a signiicant difference in performance by increasing the CPU frequency and enabling turbo-boost (though you will almost certainly also be less energy efficient).

You can do this by adding the line:

export SLURM_CPU_FREQ_REQ=2250000\n

in your job submission script before the srun command

"},{"location":"research-software/vasp/#performance-tips","title":"Performance tips","text":"

The performance of VASP depends on the version of VASP used, the performance of MPI collective operations, the choice of VASP parallelisation parameters (NCORE/NPAR and KPAR) and how many MPI processes per node are used.

KPAR: You should always use the maximum value of KPAR that is possible for your calculation within the memory limits of what is possible.

NCORE/NPAR: We have found that the optimal values of NCORE (and hence NPAR) depend on both the type of calculation you are performing (e.g. pure DFT, hybrid functional, \u0393-point, non-collinear) and the number of nodes/cores you are using for your calculation. In practice, this means that you should experiment with different values to find the best choice for your calculation. There is information below on the best choices for the benchmarks we have run on ARCHER2 that may serve as a useful starting point. The performance difference from choosing different values can vary by up to 100% so it is worth spending time investigating this.

MPI processes per node We found that it is sometimes beneficial to performance to use less MPI processes per node than the total number of cores per node in some cases for the benchmarks used.

OpenMP threads Using multiple OpenMP threads per MPI process can be beneficial to performance. 4 OpenMP threads per MPI process typically sees the best performance in the tests we have performed.

"},{"location":"research-software/vasp/#vasp-performance-data-on-archer2","title":"VASP performance data on ARCHER2","text":"

VASP performance data on ARCHER2 is currently available for two different benchmark systems:

"},{"location":"research-software/vasp/#cdte-supercell-hybrid-dft-functional-8-k-points-65-atoms","title":"CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms","text":"

Basic information:

Performance summary:

Setup details: - vasp/6/6.4.2-mkl19 module - GCC 11.2.0 - MKL 19.5 for BLAS/LAPACK/ScaLAPACK and FFTW - OFI for MPI transport layer

Nodes MPI processes per node OpenMP thread per MPI process Total cores NCORE KPAR LOOP+ Time 1 32 4 128 1 2 5838 2 32 4 256 1 2 3115 4 32 4 512 1 2 1682 8 32 4 1024 1 2 928 16 128 1 2048 16 2 612 32 128 1 4096 16 2 459 64 128 1 8192 16 2 629"},{"location":"research-software/castep/castep/","title":"Castep","text":"

This page has moved

"},{"location":"research-software/chemshell/chemshell/","title":"Chemshell","text":"

This page has moved

"},{"location":"research-software/code-saturne/code-saturne/","title":"Code saturne","text":"

This page has moved

"},{"location":"research-software/cp2k/cp2k/","title":"Cp2k","text":"

This page has moved

"},{"location":"research-software/fhi-aims/fhi-aims/","title":"Fhi aims","text":"

This page has moved

"},{"location":"research-software/gromacs/gromacs/","title":"Gromacs","text":"

This page has moved

"},{"location":"research-software/lammps/lammps/","title":"Lammps","text":"

This page has moved

"},{"location":"research-software/mitgcm/mitgcm/","title":"Mitgcm","text":"

This page has moved

"},{"location":"research-software/mo-unified-model/mo-unified-model/","title":"Mo unified model","text":"

This page has moved

"},{"location":"research-software/namd/namd/","title":"Namd","text":"

This page has moved

"},{"location":"research-software/nektarplusplus/nektarplusplus/","title":"Nektarplusplus","text":"

This page has moved

"},{"location":"research-software/nemo/nemo/","title":"Nemo","text":"

This page has moved

"},{"location":"research-software/nwchem/nwchem/","title":"Nwchem","text":"

This page has moved

"},{"location":"research-software/onetep/onetep/","title":"Onetep","text":"

This page has moved

"},{"location":"research-software/openfoam/openfoam/","title":"Openfoam","text":"

This page has moved

"},{"location":"research-software/qe/qe/","title":"Qe","text":"

This page has moved

"},{"location":"research-software/vasp/vasp/","title":"Vasp","text":"

This page has moved

"},{"location":"software-libraries/","title":"Software Libraries","text":"

This section provides information on centrally-installed software libraries and library-based packages. These provide significant functionality that is of interest to both users and developers of applications.

Libraries are made available via the module system, and fall into a number of distinct groups.

"},{"location":"software-libraries/#libraries-via-modules-cray-","title":"Libraries via modules cray-*","text":"

The following libraries are available as modules prefixed by cray- and may be of direct interest to developers and users. The modules are provided by HPE Cray to be optimised for performance on the ARCHER2 hardware, and should be used where possible. The relevant modules are:

"},{"location":"software-libraries/#integration-with-compiler-environment","title":"Integration with compiler environment","text":"

All libraries provided by modules prefixed cray- integrate with the compiler environment, and so appropriate compiler and link stage options are injected when using the standard compiler wrappers cc, CC and ftn.

"},{"location":"software-libraries/#libraries-supported-by-archer2-cse-team","title":"Libraries supported by ARCHER2 CSE team","text":"

The following libraries will also made available by the ARCHER2 CSE team:

"},{"location":"software-libraries/#integration-with-compiler-environment_1","title":"Integration with compiler environment","text":"

Again, all the libraries listed above are supported by all programming environments via the module system. Additional compile and link time flags should not be required.

"},{"location":"software-libraries/#building-your-own-library-versions","title":"Building your own library versions","text":"

For the libraries listed in this section, a set of build and installation scripts are available at the ARCHER2 Github repository.

Follow the instructions to build the relevant package (note this is the cse-develop branch of the repository). See also individual libraries pages in the list above for further details.

The scripts available from this repository should work in all three programming environments.

"},{"location":"software-libraries/adios/","title":"ADIOS","text":"

The Adaptable I/O System (ADIOS) is developed at Oak Ridge National Laboratory and is freely available under a BSD license.

"},{"location":"software-libraries/adios/#version-history","title":"Version history","text":"Upgrade 2023

The central installation of ADIOS (version 1) has been removed as it is no longer actively developed. A central installation of ADIOS (version 2) will be considered as a replacement.

Full system4-cabinet system "},{"location":"software-libraries/adios/#compile-your-own-version","title":"Compile your own version","text":"

The Archer2 github repository provides a script which can be used to build ADIOS (version 1), e.g.,:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ module load cray-hdf5-parallel\n$ ./sh/adios.sh --prefix=/path/to/install/location\n
where the --prefix option determines the install location. See the Archer2 github repository for further details and options.

"},{"location":"software-libraries/adios/#using-adios","title":"Using ADIOS","text":"

Configuration details for ADIOS are obtained via the utility adios_config which should be available in the PATH once ADIOS is installed. For example, to recover the compiler options required to provide serial C include files, issue:

$ adios_config -s -c\n
Use adios_config --help for a summary of options.

To compile and link application, such statements can be embedded in a Makefile via, e.g.,

ADIOS_INC := $(shell adios_config -s -c)\nADIOS_CLIB := $(shell adios_config -s -l)\n
See the ADIOS user manual for further details and examples.

"},{"location":"software-libraries/adios/#resources","title":"Resources","text":"

The ADIOS home page

ADIOS user manual (v1.10 pdf version)

ADIOS 1.x github repository

"},{"location":"software-libraries/aocl/","title":"AMD Optimizing CPU Libraries (AOCL)","text":"

AMD Optimizing CPU Libraries (AOCL) are a set of numerical libraries optimized for AMD \u201cZen\u201d-based processors, including EPYC, Ryzen Threadripper PRO, and Ryzen.

AOCL is comprised of eight libraries: - BLIS (BLAS Library) - libFLAME (LAPACK) - AMD-FFTW - LibM (AMD Core Math Library) - ScaLAPACK - AMD Random Number Generator (RNG) - AMD Secure RNG - AOCL-Sparse

Tip

AOCL 3.1 and 4.0 are available. 3.1 is default.

"},{"location":"software-libraries/aocl/#compiling-with-aocl","title":"Compiling with AOCL","text":"

Important

AOCL does not currently support the Cray programming environment and is currently unavailable with PrgEnv-cray loaded.

Important

The cray-libsci module is loaded by default for all users and this module also contains definitions of BLAS, LAPACK and ScaLAPACK routines that conflict with those in AOCL. The aocl module automatically unloads cray-libsci.

"},{"location":"software-libraries/aocl/#gnu-programming-environment","title":"GNU Programming Environment","text":"

AOCL 3.1 and 4.0 is available for all versions of the GCC compilers: gcc/11.2.0 and gcc/10.3.0

module load PrgEnv-gnu\nmodule load aocl\n
"},{"location":"software-libraries/aocl/#aocc-programming-environment","title":"AOCC Programming Environment","text":"

AOCL 3.1 and 4.0 is available for all versions of the AOCC compilers: aocc/3.2.0.

module load PrgEnv-aocc\nmodule load aocl\n
"},{"location":"software-libraries/aocl/#resources","title":"Resources","text":"

For more information on AOCL, please see: https://developer.amd.com/amd-aocl/#documentation

"},{"location":"software-libraries/aocl/#version-history","title":"Version history","text":"

Current modules:

"},{"location":"software-libraries/arpack/","title":"ARPACK-NG","text":"

The Arnoldi Package (ARPACK) was designed to compute eigenvalues and eigenvectors of large sparse matrices. Originally from Rice University, an open source version (ARPACK-NG) is available under a BSD license and is made available here.

"},{"location":"software-libraries/arpack/#compiling-and-linking-with-arpack","title":"Compiling and linking with ARPACK","text":"

To compile an application against the ARPACK-NG libraries, load the arpack-ng module and use the compiler wrappers cc, CC, and ftn in the usual way.

The arpack-ng module defines ARPACK_NG_DIR which locates the root of the installation for the current programming environment.

"},{"location":"software-libraries/arpack/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/arpack/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version of MUMPS on Archer2 can be compiled using a script available from the Archer githug repository.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/arpack-ng.sh --prefix=/path/to/install/location\n
where the --prefix specifies a suitable location. See the Archer2 github repository for further options and details. Note that the build process runs the tests, for which an salloc allocation is required to allow the parallel tests to run correctly.

"},{"location":"software-libraries/arpack/#resources","title":"Resources","text":"

ARPACK-NG github site

"},{"location":"software-libraries/boost/","title":"Boost","text":"

Boost provide portable C++ libraries useful in a broad range of contexts. The libraries are freely available under the terms of the Boost Software license.

"},{"location":"software-libraries/boost/#compiling-and-linking","title":"Compiling and linking","text":"

The C++ compiler wrapper CC will introduce the appropriate options to compile an application against the Boost libraries. The other compiler wrappers (cc and ftn) do not introduce these options.

To check exactly what options are introduced type, e.g.,

$ CC --cray-print-opts\n

The boost module also defines the environment variable BOOST_DIR as the root of the installation for the current programming environment if this information is needed.

"},{"location":"software-libraries/boost/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

The following libraries are installed: atomic chrono container context contract coroutine date_time exception fiber filesystem graph_parallel graph iostreams locale log math mpi program_options random regex serialization stacktrace system test thread timer type_erasure wave

"},{"location":"software-libraries/boost/#compiling-boost","title":"Compiling Boost","text":"

The ARCHER2 Github repository contains a recipe for compiling Boost for the different programming environments.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout cse-develop\n$ ./sh/boost.sh --prefix=/path/to/install/location\n
where the --prefix determines the install location. The list of libraries compiled is specified in the boost.sh script. See the ARCHER2 Github repository for further information.

"},{"location":"software-libraries/boost/#resources","title":"Resources","text":""},{"location":"software-libraries/eigen/","title":"Eigen","text":"

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

"},{"location":"software-libraries/eigen/#compiling-with-eigen","title":"Compiling with Eigen","text":"

To compile an application with the Eigen header files, load the eigen module and use the compiler wrappers cc, CC, or ftn in the usual way. The relevant header files will be introduced automatically.

The header files are located in /work/y07/shared/libs/core/eigen/3.4.0/, and can be included manually at compilation without loading the module if required.

"},{"location":"software-libraries/eigen/#version-history","title":"Version history","text":""},{"location":"software-libraries/eigen/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version on Archer2 can be built using the following script

$ wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz\n$ tar xvf eigen-3.4.0.tar.gz\n$ cmake eigen-3.4.0/ -DCMAKE_INSTALL_PREFIX=/path/to/install/location\n$ make install\n
where the -DCMAKE_INSTALL_PREFIX option determines the install directory. Installing in this way will also build the Eigen documentation and unit-tests.

"},{"location":"software-libraries/eigen/#resources","title":"Resources","text":"

Eigen home page

Getting Started guide

"},{"location":"software-libraries/fftw/","title":"FFTW","text":"

FFTW is a C subroutine library (which includes a Fortran interface) for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

Only the version 3 interface is available on ARCHER2.

"},{"location":"software-libraries/glm/","title":"GLM","text":"

OpenGL Mathemetics (GLM) is a header-only C++ library which performs operations typically encountered in graphics applications, but can also be relevant to scientific applications. GLM is freely available under an MIT license.

"},{"location":"software-libraries/glm/#compiling-with-glm","title":"Compiling with GLM","text":"

The compiler wrapper CC will automatically location the required include directory when the module is loaded.

The glm module also defines the environment variable GLM_DIR which carries the root of the installation, if needed.

"},{"location":"software-libraries/glm/#version-history","title":"Version history","text":"Full system4-cabinet system "},{"location":"software-libraries/glm/#install-your-own-version","title":"Install your own version","text":"

One can follow the instructions used to install the current version on ARCHER2 via the ARCHER2 Github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2021-10\n$ ./sh/glm.sh --prefix=/path/to/install/location\n
where the --prefix option sets the install location. See the ARCHER2 Github repository for further details.

"},{"location":"software-libraries/glm/#resources","title":"Resources","text":"

The GLM Github repository.

"},{"location":"software-libraries/hdf5/","title":"HDF5","text":"

The Hierarchical Data Format HDF5 (and its parallel manifestation HDF5 parallel) is a standard library and data format developed and supported by The HDF Group, and is released under a BSD-like license.

Both serial and parallel versions are available on ARCHER2 as standard modules:

Use module help to locate cray-specific release notes on a particular version.

Known issues:

Upgrade 2023Full system4-cabinet system

Some general comments and information on serial and parallel I/O to ARCHER2 are given in the section on I/O and file systems.

"},{"location":"software-libraries/hdf5/#resources","title":"Resources","text":"

Tutorials and introduction to HDF5 at the HDF5 Group pages.

General information for developers of HDF5.

"},{"location":"software-libraries/hypre/","title":"HYPRE","text":"

HYPRE is a library of linear solvers for structured and unstructured problems with a particular emphasis on multigrid. It is a product of the Lawrence Livermore National Laboratory and is distributed under either the MIT license or the Apache license.

"},{"location":"software-libraries/hypre/#compiling-and-linking-with-hypre","title":"Compiling and linking with HYPRE","text":"

To compile and link an application with the HYPRE libraries, load the hypre module and use the compiler wrappers cc, CC, or ftn in the usual way. The relevant include files and libraries will be introduced automatically.

Two versions of HYPRE are included: one with, and one without, OpenMP. The relevant version will be selected if e.g., -fopenmp is included in the compile or link stage.

The hypre module defines the environment variable HYPRE_DIR which will show the root of the installation for the current programming environment if required.

"},{"location":"software-libraries/hypre/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/hypre/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version on Archer2 can be built using the script from the Archer2 repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/hypre.sh --prefix=/path/to/install/location\n
where the --prefix option determines the install directory. See the Archer2 github repository for more information.

"},{"location":"software-libraries/hypre/#resources","title":"Resources","text":"

HYPRE home page

The latest HYPRE user manual (HTML)

An older pdf version

HYPRE github repository

"},{"location":"software-libraries/libsci/","title":"HPE Cray LibSci","text":"

Cray scientific libraries, available for all compiler choices provides access to the Fortran BLAS and LAPACK interface for basic linear algebra, the corresponding C interfaces CBLAS and LAPACKE, and BLACS and ScaLAPACK for parallel linear algebra. Type man intro_libsci for further details.

Additionally there is GPU support available via the cray-libsci_acc module. More information can be found here.

"},{"location":"software-libraries/matio/","title":"Matio","text":"

Matio is a library which allows reading and writing matrices in MATLAB MAT format. It is an open source development released under a BSD license.

"},{"location":"software-libraries/matio/#compiling-and-linking-against-matio","title":"Compiling and linking against Matio","text":"

Load the matio module and use the standard compiler wrappers cc, CC, or ftn in the usual way. The appropriate header files and libraries will be included automatically via the compiler wrappers.

The matio module set the PATH variable so that the stand-alone utility matdump can be used. The module also defines MATIO_PATH which gives the root of the installation if this is needed.

"},{"location":"software-libraries/matio/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/matio/#compiling-your-own-version","title":"Compiling your own version","text":"

A version of Matio as currently installed on Archer2 can be compiled using the script avaailable from the Archer2 github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/matio.sh --prefix=/path/to/install/location\n
where --prefix defines the location of the installation.

"},{"location":"software-libraries/matio/#resources","title":"Resources","text":"

Matio github repository

"},{"location":"software-libraries/mesa/","title":"Mesa","text":"

Mesa is an open-source implementation of OpenGL, Vulkan, and other graphics API to vendor-specific hardware drivers.

"},{"location":"software-libraries/mesa/#compiling-with-mesa","title":"Compiling with Mesa","text":"

To compile an application with the mesa header files, load the mesa module and use the compiler wrappers in the usual way. The relevant header files will be introduced automatically.

The header files are located in /work/y07/shared/libs/core/mesa/21.0.1/, and can be included manually at compilation without loading the module if required.

"},{"location":"software-libraries/mesa/#version-history","title":"Version history","text":""},{"location":"software-libraries/mesa/#compiling-your-own-version","title":"Compiling your own version","text":"

Build recipe for this module can be found at the HPC-UK github repo

"},{"location":"software-libraries/mesa/#resources","title":"Resources","text":"

Mesa home page

"},{"location":"software-libraries/metis/","title":"Metis and Parmetis","text":"

The University of Minnesota provide a family of libraries for partitioning graphs and meshes, and computing fill-reducing ordering of sparse matrices. These libraries coming broadly under the label of \"Metis\". They are free to use for educational and research purposes.

"},{"location":"software-libraries/metis/#metis","title":"Metis","text":"

Metis is the sequential library for partitioning problems; it also supplies a number of simple stand-alone utility programs to access the Metis API for graph and mesh partitioning, and graph and mesh manipulation. The stand alone programs typically read a graph or mesh from file which must be in \"metis\" format.

"},{"location":"software-libraries/metis/#compiling-and-linking-with-metis","title":"Compiling and linking with Metis","text":"

The Metis library available via module load metis comes both with and without support for OpenMP. When using the compiler wrappers cc, CC, and ftn, the appropriate version will be selected based on the presence or absence of, e.g., -fopenmp in the compile or link invocation.

Use, e.g.,

$ cc --cray-print-opts\n
or
$ cc -fopenmp --cray-print-opts\n
to see exactly what options are being issued by the compiler wrapper when the metis module is loaded.

Metis is currently provided as static libraries, so it should not be necessary to re-load the metis module at run time.

The serial utilities (e.g. gpmetis for graph partitioning) are supplied without OpenMP. These may then be run on the front end for small problems if the metis module is loaded.

The metis module defines the environment variable METIS_DIR which indicates the current location of the Metis installation.

Note the metis and parmetis libraries (and dependent modules) have been compiled with the default 32-bit integer indexing, and 4-byte floating point options.

"},{"location":"software-libraries/metis/#parmetis","title":"Parmetis","text":"

Parmetis is the distributed memory incarnation of the Metis functionality. As for the metis module, Parmetis is integrated with use of the compiler wrappers cc, CC, and ftn.

Parmetis depends on the metis module, which is loaded automatically by the parmetis module.

The parmetis module defines the environment variable PARMETIS_DIR which holds the current location of the Parmetis installation. This variable may not respond to a change of compiler version within a given programming environment. If you wish to use PARMETIS_DIR in such a context, you may need to (re-)load the parmetis module after the change of compiler version.

"},{"location":"software-libraries/metis/#module-version-history","title":"Module version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/metis/#compile-your-own-version","title":"Compile your own version","text":"

The build procedure used for the Metis and Parmetis libraries on Archer2 is available via github.

"},{"location":"software-libraries/metis/#metis_1","title":"Metis","text":"

The latest Archer2 version of Metis can be installed

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n

where --prefix determines the install location. This will download and install the default version for the current programming environment.

"},{"location":"software-libraries/metis/#parmetis_1","title":"Parmetis","text":"

Parmetis can be installed in via the same mechanism as Metis:

$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n
The Metis package should be installed first (as above) using the same location. See the Archer2 repository for further details and options.

"},{"location":"software-libraries/metis/#resources","title":"Resources","text":"

-- Metis and Parmetis at github

"},{"location":"software-libraries/mkl/","title":"Intel Math Kernel Library (MKL)","text":"

The Intel Maths Kernel Libraries (MKL) contain a variety of optimised numerical libraries including BLAS, LAPACK, ScaLAPACK and FFTW. In general, the exact commands required to build against MKL depend on the details of compiler, environment, requirements for parallelism, and so on. The Intel MKL link line advisor should be consulted.

Some examples are given below. Note that loading the mkl module will provide the environment variable MKLROOT which holds the location of the various MKL components.

Warning

The ARCHER2 CSE team have seen that using MKL on ARCHER2 for some software leads to failed regression tests due to numerical differences between refernece results and those produced with software using MKL.

We strongly recommend that you use the HPE Cray LibSci and HPE Cray FFTW libraries for software if at all possible rather than MKL. If you do decide to use MKL on ARCHER2, then you should carefully validate results from your software to ensure that it is giving the expected results.

Important

The cray-libsci module is loaded by default for all users and this module also contains definitions of BLAS, LAPACK and ScaLAPACK routines that conflict with those in MKL. The mkl module automatically unloads cray-libsci.

Important

The mkl module needs to be loaded both at compile time and at runtime (usually in your job submission script).

Tip

MKL only supports the GCC programming environment (PrgEnv-gnu). Other programming environments may work but this is untested and unsupported on ARCHER2.

"},{"location":"software-libraries/mkl/#serial-mkl-with-gcc","title":"Serial MKL with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl"},{"location":"software-libraries/mkl/#threaded-mkl-with-gcc","title":"Threaded MKL with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"},{"location":"software-libraries/mkl/#mkl-parallel-scalapack-with-gcc","title":"MKL parallel ScaLAPACK with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lgomp -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lgomp -lpthread -lm -ldl"},{"location":"software-libraries/mumps/","title":"MUMPS","text":"

MUMPS is a parallel solver for large sparse systems and features a 'multifrontal' method and is developed largely at CERFCAS, ENS Lyon, IRIT Toulouse, INRIA, and the University of Bordeaux. It is provided free of charge and is largely under a CeCILL-C license.

"},{"location":"software-libraries/mumps/#compiling-and-linking-with-mumps","title":"Compiling and linking with MUMPS","text":"

To compile an application against the MUMPS libraries, load the mumps module and use the compiler wrappers cc, CC, and ftn in the usual way.

MUMPS is configured to allow Pord, Metis, Parmetis, and Scotch orderings.

Two versions of MUMPS are provided: one with, and one without, OpenMP. The relevant version will be selected if the relevant option is included at the compile stage.

The mumps module defines MUMPS_DIR which locates the root of the installation for the current programming environment.

"},{"location":"software-libraries/mumps/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: mumps/5.5.1 uses scotch/7.0.3 while mumps/5.3.5 uses scotch/6.1.0.

Known issues: The OpenMP version in PrgEnv-aocc is not available at the moment.

"},{"location":"software-libraries/mumps/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version of MUMPS on Archer2 can be compiled using a script available from the Archer githug repository.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/mumps.sh --prefix=/path/to/install/location\n
where the --prefix option should be the same for MUMPS at the three dependencies (Metis, Parmetis, and Scotch Version 7). See the Archer2 github repository for further options and details.

"},{"location":"software-libraries/mumps/#resources","title":"Resources","text":"

The MUMPS home page

MUMPS user manual (Version 5.6, pdf)

"},{"location":"software-libraries/netcdf/","title":"NetCDF","text":"

The Network Common Data Form NetCDF (and its parallel manifestation NetCDF parallel) is a standard library and data format developed and supported by UCAR is released under a BSD-like license.

Both serial and parallel versions are available on ARCHER2 as standard modules:

Note that one should first load the relevant HDF module file, e.g.,

$ module load cray-hdf5\n$ module load cray-netcdf\n
for the serial version.

Use module spider to locate available versions, and use module help to locate cray-specific release notes on a particular version.

Known issues:

Upgrade 2023Full system4-cabinet system

Some general comments and information on serial and parallel I/O to ARCHER2 are given in the section on I/O and file systems.

"},{"location":"software-libraries/netcdf/#resources","title":"Resources","text":"

The NetCDF home page.

"},{"location":"software-libraries/petsc/","title":"PETSc","text":"

PETSc is a suite of parallel tools for solution of partial differential equations. PETSc is developed at Argonne National Laboratory and is freely available under a BSD 2-clause license.

"},{"location":"software-libraries/petsc/#build","title":"Build","text":"

Applications may be linked against PETSc by loading the petsc module and using the compiler wrappers cc, CC, and ftn in the usual way. Details of options introduced by the compiler wrappers can be examined via, e.g.,

$ cc --cray-print-opts\n

PETSC is configured with Metis, Parmetis, and Scotch orderings, and to support HYPRE, MUMPS, SuperLU, and SuperLU-DIST. PETSc is compiled without OpenMP.

The petsc module defines the environment variable PETSC_DIR as the root of the installation if this is required.

"},{"location":"software-libraries/petsc/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: PETSc has a number of dependencies; where applicable, the newer version of PETSc depends on the newer module version of each relevant dependency. Check module list to be sure.

Known issues: PETSc is not currently available for PrgEnv-aocc. There is no HYPRE support in this version.

"},{"location":"software-libraries/petsc/#compile-your-own-version","title":"Compile your own version","text":"

It is possible to follow the steps used to build the current version on Archer2. These steps are codified at the Archer2 github repository and include a number of dependencies to be built in the correct order:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2012-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/hypre.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/mumps.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu-dist.sh --prefix=/path/to/install/location\n\n$ module load cray-hdf5\n$ ./sh/petsc.sh --prefix=/path/to/install/location\n
The --prefix option indicating the install directory should be the same in all cases. See the Archer2 github repository for further details (and options). This will compile version 3.18.5 against the latest module versions of each dependency.

"},{"location":"software-libraries/petsc/#resources","title":"Resources","text":"

PETSc home page

Current PETSc documentation (HTML)

"},{"location":"software-libraries/scotch/","title":"Scotch and PT-Scotch","text":"

Scotch and its parallel version PT-Scotch are provided by Labri at the University of Bordeaux and INRIA Bordeaux South-West. They are used for graph partitioning and ordering problems. The libraries are freely available for scientific use under a license similar to the LGPL license.

"},{"location":"software-libraries/scotch/#scotch-and-pt-scotch_1","title":"Scotch and PT-Scotch","text":"

The scotch module provides access to both the Scotch and PT-Scotch libraries via the compiler system. A number of stand-alone utilities are also provided as part of the package.

"},{"location":"software-libraries/scotch/#compiling-and-linking","title":"Compiling and linking","text":"

If the scotch module is loaded, then applications may be automatically compiled and linked against the libraries for the current programming environment. Check, e.g.,

$ cc --cray-print-opts\n
if you wish to see exactly what options are generated by the compiler wrappers.

Scotch and PT-Scotch libraries are provides as static archives only. The compiler wrappers do not give access to the libraries libscotcherrexit.a or libptscotcherrexit.a. If you wish to perform your own error handling these libraries must be linked manually.

The scotch module defines the environment SCOTCH_DIR which holds the root of the installation for a given programming environment. Libraries are present in ${SCOTCH_DIR}/lib.

Stand-alone applications are also available. See the Scotch and PT-Scotch user manuals for further details.

"},{"location":"software-libraries/scotch/#module-version-history","title":"Module version history","text":"Upgrade 2023Full system4-cabinet system

Note: scotch/7.0.3 has disabled a number of features including the Metis compatibility layer, and threads, to allow all tests to pass.

"},{"location":"software-libraries/scotch/#compiling-your-own-version","title":"Compiling your own version","text":"

The build procedure for the Scotch package on Archer2 is available via github.

"},{"location":"software-libraries/scotch/#scotch-and-pt-scotch_2","title":"Scotch and PT-Scotch","text":"

The latest Scotch and PT-Scotch libraries are installed on Archer using the following mechanism:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n
where the --prefix option defines the destination for the install. This script will download, compile and install version 7.0.3. A separate script (scotch.sh) in the same location is used for version 6.

"},{"location":"software-libraries/scotch/#resources","title":"Resources","text":"

The Scotch home page

Scotch user manual (pdf)

PT-Scotch user manual (pdf)

"},{"location":"software-libraries/slepc/","title":"SLEPC","text":"

The Scalable Library for Eigenvalue Problem computations is an extension of PETSc developed at the Universitat Politecnica de Valencia. SLEPc is freely available under a 2-clause BSD license.

"},{"location":"software-libraries/slepc/#compiling-and-linking-with-slepc","title":"Compiling and linking with SLEPc","text":"

To compile an application against the SLEPc libraries, load the slepc module and use the compiler wrappers cc, CC, and ftn in the usual way. Static libraries are available so no module is required at run time.

The SLEPc module defines SLEPC_DIR which locates the root of the installation.

"},{"location":"software-libraries/slepc/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: each SLEPc module depends on a PETSc module with the same minor version number.

"},{"location":"software-libraries/slepc/#compiling-your-own-version","title":"Compiling your own version","text":"

The version of SLEPc currently available on ARCHER2 can be compiled using a script available from the ARCHER2 github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/slepc.sh --prefix=/path/to/install/location\n
The dependencies (including PETSc) can be built in the same way, or taken from the existing modules. See the ARCHER2 github repository for further information.

"},{"location":"software-libraries/slepc/#resources","title":"Resources","text":"

SLEPc home page

Latest release version of SLEPc user manual (PDF)

SLEPc Gitlab repository

"},{"location":"software-libraries/superlu/","title":"SuperLU and SuperLU_DIST","text":"

SuperLU and SuperLU_DIST are libraries for the direct solution of large sparse non-symmetric systems of linear equations, typically by factorisation and back-substitution. The libraries are provided by Lawrence Berkeley National Laboratory and are freely available under a slightly modified BSD-style license.

Two separate modules are provided for SuperLU and SuperLU_DIST.

"},{"location":"software-libraries/superlu/#superlu","title":"SuperLU","text":"

This module provides the serial library SuperLU.

"},{"location":"software-libraries/superlu/#compiling-and-linking-with-superlu","title":"Compiling and linking with SuperLU","text":"

Compiling and linking SuperLU applications requires no special action beyond module load superlu and using the standard compiler wrappers cc, CC, or ftn. The exact options issued by the compiler wrapper can be examined via, e.g.,

$ cc --cray-print-opts\n
while the module is loaded.

The module defines the environment variable SUPERLU_DIR as the root location of the installation for a given programming environment.

"},{"location":"software-libraries/superlu/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/superlu/#superlu_dist","title":"SuperLU_DIST","text":"

This modules provides the distributed memory parallel library SuperLU_DIST both with and without OpenMP.

"},{"location":"software-libraries/superlu/#compiling-and-linking-superlu_dist","title":"Compiling and linking SuperLU_DIST","text":"

Use the standard compiler wrappers:

$ cc my_superlu_dist_application.c\n
or
$ cc -fopenmp my_superlu_dist_application.c\n
to compile the and link against the appropriate libraries.

The superlu-dist module defines the environment variable SUPERLU_DIST_DIR as the root of the installation for the current programming environment.

"},{"location":"software-libraries/superlu/#version-history_1","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/superlu/#compiling-your-own-version","title":"Compiling your own version","text":"

The build used for Archer2 can be replicated by using the scripts provided at the Archer2 repository.

"},{"location":"software-libraries/superlu/#superlu_1","title":"SuperLU","text":"

The current Archer2 supported version may be built via

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/superlu.sh --prefix=/path/to/install/location\n
where the --prefix option controls the install destination.

"},{"location":"software-libraries/superlu/#superlu_dist_1","title":"SuperLU_DIST","text":"

SuperLU_DIST is configured using Metis and Parmetis, so these should be installed first:

$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu_dist.sh --prefix=/path/to/install/location\n
will download, compile, and install the relevant libraries. The install location should be the same for all three packages. See the Archer2 github repository for further options and details.

"},{"location":"software-libraries/superlu/#resources","title":"Resources","text":"

The Supernodal LU project home page

The SuperLU User guide (pdf). This describes both SuperLU and SuperLU_DIST.

The SuperLU github repository

The SuperLU_DIST github repository

"},{"location":"software-libraries/trilinos/","title":"Trilinos","text":"

Trilinos is a large collection of packages with software components that can be used for scientific and engineering problems. Most of the package are released under a BSD license (and some under LGPL).

"},{"location":"software-libraries/trilinos/#compiling-and-linking-against-trilinos","title":"Compiling and linking against Trilinos","text":"

Applications may be built against the module version of Trilinos by using the using the compiler wrappers CC or ftn in the normal way. The appropriate include files and library paths will be inserted automatically. Trilinos is build with OpenPM enabled.

The trilinos module defines the environment variable TRILINOS_DIR as the root of the installation for the current programming environment.

Trilinos also provides a small number of stand-alone executables which are available via the standard PATH mechanism while the module is loaded.

"},{"location":"software-libraries/trilinos/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note that Trilinos is not currently available for PrgEnv-aocc.

If using AMD compilers, module version aocc/3.0.0 is required.

Known issue

Trilinos is not available in PrgEnv-aocc at the moment.

Known issue

The ForTrilinos package is not available in this version.

Packages enabled are: Amesos, Amesos2, Anasazi, AztecOO Belos Epetra EpretExt FEI Galeri GlobiPack Ifpack Ifpack2 Intrepid Isorropia Kokkos Komplex Mesquite ML Moertel MueLu NOX OptiPack Pamgen Phalanx Piro Pliris ROL RTOp Rythmos Sacado Shards ShyLU STK STKSearch STKTopology STKUtil Stratimikos Teko Teuchos Thyra Tpetra TrilinosCouplings Triutils Xpetra Zoltan Zoltan2

"},{"location":"software-libraries/trilinos/#compiling-trilinos","title":"Compiling Trilinos","text":"

A script which has details of the relevant configuration options for Trilinos is available at the ARCHER2 Github repository. The script will build a static-only version of the libraries.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ...\n$ ./sh/trilinos.sh --prefix=/path/to/install/location\n
where --prefix sets the installation location. The ellipsis ... is standing for the dependencies used to build Trilinos, which here are: metis, parmetis, superlu, superlu-dist, scotch, mumps, glm, boost. These packages should be built as described in their corresponding pages linked in the menu on the left.

See the ARCHER2 Github repository for further details.

Note that Trilinos may take up to one hour to compile on its own, and so the compilation is best performed as a batch job.

"},{"location":"software-libraries/trilinos/#resources","title":"Resources","text":""},{"location":"user-guide/","title":"User and Best Practice Guide","text":"

The ARCHER2 User and Best Practice Guide covers all aspects of use of the ARCHER2 service. This includes fundamentals (required by all users to use the system effectively), best practice for getting the most out of ARCHER2 and more technical topics.

The User and Best Practice Guide contains the following sections:

"},{"location":"user-guide/analysis/","title":"Data analysis","text":"

As well as being used for scientific simulations, ARCHER2 can also be used for data pre-/post-processing and analysis. This page provides an overview of the different options for doing so.

"},{"location":"user-guide/analysis/#using-the-login-nodes","title":"Using the login nodes","text":"

The easiest way to run non-computationally intensive data analysis is to run directly on the login nodes. However, please remember that the login nodes are a shared resource and should not be used for long-running tasks.

"},{"location":"user-guide/analysis/#example-running-an-r-script-on-a-login-node","title":"Example: Running an R script on a login node","text":"
module load cray-R\nRscript example.R\n
"},{"location":"user-guide/analysis/#using-the-compute-nodes","title":"Using the compute nodes","text":"

If running on the login nodes is not feasible (e.g. due to memory requirements or computationally intensive analysis), the compute nodes can also be used for data analysis.

Important

This is a more expensive option, as you will be charged for using the entire node, even though your analysis may only be using one core.

"},{"location":"user-guide/analysis/#example-running-an-r-script-on-a-compute-node","title":"Example: Running an R script on a compute node","text":"
#!/bin/bash\n#SBATCH --job-name=data_analysis\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load cray-R\n\nRscript example.R\n

An advantage of this method is that you can use Job chaining to automate the process of analysing your output data once your compute job has finished.

"},{"location":"user-guide/analysis/#using-interactive-jobs","title":"Using interactive jobs","text":"

For more interactive analysis, it may be useful to use salloc to reserve a compute node on which to do your analysis. This allows you to run jobs directly on the compute nodes from the command line without using a job submission script. More information on interactive jobs can be found here.

"},{"location":"user-guide/analysis/#example-reserving-a-single-node-for-20-minutes-for-interactive-analysis","title":"Example: Reserving a single node for 20 minutes for interactive analysis","text":"
auser@ln01:> salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 \\\n                --time=00:20:00 --partition=standard --qos=short \\\n                --account=[budget code]\n

Note

If you want to run for longer than 20 minutes, you will need to use a different QoS as the maximum runtime for the short QoS is 20 mins.

"},{"location":"user-guide/analysis/#data-analysis-nodes","title":"Data analysis nodes","text":"

The data analysis nodes on the ARCHER2 system are designed for large compilations, post-calculation analysis and data manipulation. They should be used for jobs which are too small to require a whole compute node, but which would have an adverse impact on the operation of the login nodes if they were run interactively.

Unlike compute nodes, the data analysis nodes are able to access the home, work, and the RDFaaS file systems. They can also be used to transfer data from a remote system to ARCHER2 and vice versa (using e.g. scp or rsync). This can be useful when transferring large amounts of data that might take hours to complete.

"},{"location":"user-guide/analysis/#requesting-resources-on-the-data-analysis-nodes-using-slurm","title":"Requesting resources on the data analysis nodes using Slurm","text":"

The ARCHER2 data analysis nodes can be reached by using the serial partition and the serial QoS. Unlike other nodes on ARCHER2, you may only request part of a single node and you will likely be sharing the node with other users.

The data analysis nodes are set up such that you can specify the number of cores you want to use (up to 32 physical cores) and the amount of memory you want for your job (up to 125 GB). You can have multiple jobs running on the data analysis nodes at the same time, but the total number of cores used by those jobs cannot exceed 32, and the total memory used by jobs currently running from a single user cannot exceed 125 GB -- any jobs above this limit will remain pending until your previous jobs are finished.

You do not need to specify both number of cores and memory for jobs on the data analysis nodes. By default, you will get 1984 MiB of memory per core (which is a little less than 2 GB), when specifying cores only, and 1 core when specifying the memory only.

Note

Each data analysis node is fitted with 512 GB of memory. However, a small amount of this memory is needed for system processes, which is why we set an upper limit of 125 GB per user (a user is limited to one quarter of the RAM on a node). This is also why the per-core default memory allocation is slightly less than 2 GB.

Note

When running on the data analysis nodes, you must always specify either the number of cores you want, the amount of memory you want, or both. The examples shown below specify the number of cores with the --ntasks flag and the memory with the --mem flag. If you are only wanting to specify one of the two, please remember to delete the other one.

"},{"location":"user-guide/analysis/#example-running-a-serial-batch-script-on-the-data-analysis-nodes","title":"Example: Running a serial batch script on the data analysis nodes","text":"

A Slurm batch script for the data analysis nodes looks very similar to one for the compute nodes. The main differences are that you need to use --partition=serial and --qos=serial, specify the number of tasks (rather than the number of nodes) and/or specify the amount of memory you want. For example, to use a single core and 4 GB of memory, you would use something like:

#!/bin/bash\n\n# Slurm job options (job-name, job time)\n#SBATCH --job-name=data_analysis\n#SBATCH --time=0:20:0\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Define memory required for this jobs. By default, you would\n# get just under 2 GB, but you can ask for up to 125 GB.\n#SBATCH --mem=4G\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nmodule load cray-python\n\npython my_analysis_script.py\n
"},{"location":"user-guide/analysis/#interactive-session-on-the-data-analysis-nodes","title":"Interactive session on the data analysis nodes","text":"

There are two ways to start an interactive session on the data analysis nodes: you can either use salloc to reserve a part of a data analysis node for interactive jobs; or, you can use srun to open a terminal on the node and run things on the node directly. You can find out more information on the advantages and disadvantages of both of these methods in the Running jobs on ARCHER2 section of the User and Best Practice Guide.

"},{"location":"user-guide/analysis/#using-salloc-for-interactive-access","title":"Using salloc for interactive access","text":"

You can reserve resources on a data analysis node using salloc. For example, to request 1 core and 4 GB of memory for 20 minutes, you would use:

auser@ln01:~> salloc --time=00:20:00 --partition=serial --qos=serial \\\n                    --account=[budget code] --ntasks=1 \\\n                    --mem=4G\n

When you submit this job, your terminal will display something like:

salloc: Pending job allocation 523113\nsalloc: job 523113 queued and waiting for resources\nsalloc: job 523113 has been allocated resources\nsalloc: Granted job allocation 523113\nsalloc: Waiting for resource configuration\nsalloc: Nodes dvn01 are ready for job\n\nauser@ln01:~>\n

It may take some time for your interactive job to start. Once it runs you will enter a standard interactive terminal session (a new shell). Note that this shell is still on the front end (the prompt has not changed). Whilst the interactive session lasts you will be able to run jobs on the data analysis nodes by issuing the srun command directly at your command prompt. The maximum number of cores and memory you can use is limited by resources requested in the salloc command (or by the defaults if you did not explicitly ask for particular amounts of resource).

Your session will end when you hit the requested walltime. If you wish to finish before this you should use the exit command - this will return you to your prompt before you issued the salloc command.

"},{"location":"user-guide/analysis/#using-srun-for-interactive-access","title":"Using srun for interactive access","text":"

You can get a command prompt directly on the data analysis nodes by using the srun command directly. For example, to reserve 1 core and 8 GB of memory, you would use:

auser@ln01:~> srun   --time=00:20:00 --partition=serial --qos=serial \\\n                    --account=[budget code]    \\\n                    --ntasks=1 --mem=8G \\\n                    --pty /bin/bash\n

The --pty /bin/bash will cause a new shell to be started on the data analysis node. (This is perhaps closer to what many people consider an 'interactive' job than the method using the salloc method described above.)

One can now issue shell commands in the usual way.

When finished, type exit to relinquish the allocation and control will be returned to the front end.

By default, the interactive shell will retain the environment of the parent. If you want a clean shell, remember to specify the --export=none option to the srun command.

"},{"location":"user-guide/analysis/#visualising-data-using-the-data-analysis-nodes-using-x","title":"Visualising data using the data analysis nodes using X","text":"

You can view data on the data analysis nodes by starting an interactive srun session with the --x11 flag to export the X display back to your local system. For 1 core with * GB of memory:

auser@ln01:~> srun   --time=00:20:00 --partition=serial --qos=serial  \\\n                        --hint=nomultithread --account=[budget code]    \\\n                        --ntasks=1 --mem=8G --x11 --pty /bin/bash\n

Tip

Data visualisation on ARCHER2 is only possible if you used the -X or -Y flag to the ssh command when when logging in to the system.

"},{"location":"user-guide/analysis/#using-singularity","title":"Using Singularity","text":"

Singularity can be useful for data analysis, as sites such as DockerHub or SingularityHub contain many pre-built images of data analysis tools that can be simply downloaded and used on ARCHER2. More information about Singularity on ARCHER2 can be found in the Containers section section of the User and Best Practice Guide.

"},{"location":"user-guide/analysis/#data-analysis-tools","title":"Data analysis tools","text":"

Useful tools for data analysis can be found on the Data Analysis and Tools page.

"},{"location":"user-guide/connecting-totp/","title":"Connecting to ARCHER2","text":"

This section covers the basic connection methods.

On the ARCHER2 system, interactive access is achieved using SSH, either directly from a command-line terminal or using an SSH client. In addition, data can be transferred to and from the ARCHER2 system using scp from the command line or by using a file-transfer client.

Before following the process below, we assume you have set up an account on ARCHER2 through the EPCC SAFE. Documentation on how to do this can be found at:

"},{"location":"user-guide/connecting-totp/#command-line-terminal","title":"Command line terminal","text":""},{"location":"user-guide/connecting-totp/#linux","title":"Linux","text":"

Linux distributions include a terminal application that can be used for SSH access to the ARCHER2 login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g., GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

"},{"location":"user-guide/connecting-totp/#macos","title":"MacOS","text":"

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

"},{"location":"user-guide/connecting-totp/#windows","title":"Windows","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend Windows users download and install MobaXterm to access ARCHER2. It is very easy to use and includes an integrated X Server, which allows you to run graphical applications on ARCHER2.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi) and follow the instructions from the Windows Installation Wizard. Note, you might need to have administrator rights to install on some versions of Windows. Also, make sure to check whether Windows Firewall has blocked any features of this program after installation (Windows will warn you if the built-in firewall blocks an action, and gives you the opportunity to override the behaviour).

Once installed, start MobaXterm and then click \"Start local terminal\".

Tips

"},{"location":"user-guide/connecting-totp/#access-credentials","title":"Access credentials","text":"

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password. You can find more detailed instructions on how to set up your credentials to access ARCHER2 from Windows, MacOS and Linux below.

"},{"location":"user-guide/connecting-totp/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access ARCHER2.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting-totp/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE.

Then:

  1. Go to the Menu Login accounts and select the ARCHER2 account you want to add the SSH key to.
  2. On the subsequent Login Account details page, click the Add Credential button.
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key with your account.

Once you have done this, your SSH key will be added to your ARCHER2 account.

"},{"location":"user-guide/connecting-totp/#mfa-time-based-one-time-passcode-totp-code","title":"MFA Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode to log into ARCHER2 so you will also need to set up a method for generating a TOTP code before you can log into ARCHER2.

"},{"location":"user-guide/connecting-totp/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to ARCHER2 after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to ARCHER2 with a new account. When you log into ARCHER2 for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into ARCHER2 from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting-totp/#ssh-clients","title":"SSH Clients","text":"

As noted above, you interact with ARCHER2, over an encrypted communication channel (specifically, Secure Shell version 2 (SSH-2)). This allows command-line access to one of the login nodes of ARCHER2, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers, when used in conjunction with an X Server.

"},{"location":"user-guide/connecting-totp/#logging-in","title":"Logging in","text":"

The login addresses for ARCHER2 are:

You can use the following command from the terminal window to log in to ARCHER2:

Full system
ssh username@login.archer2.ac.uk\n

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you logged into ARCHER2 with your account before the major upgrade in May/June 2023 you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Warning

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_ARCHER2 you would use the command ssh -i keys/id_rsa_ARCHER2 username@login.archer2.ac.uk to log in (or the equivalent for the 4-cabinet system).

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three-step process:

  1. When promoted to enter your ldap password: Re-enter the password you retrieved from SAFE.
  2. When prompted to enter your new password: type in a new password.
  3. When prompted to re-enter the new password: re-enter the new password.

Your password will now have been changed

To allow remote programs, especially graphical applications, to control your local display, such as for a debugger, use:

Full system
ssh -X username@login.archer2.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting-totp/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to ARCHER2. The host keys are checked against the login nodes when you login to ARCHER2 and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the ARCHER2 login nodes.

"},{"location":"user-guide/connecting-totp/#loginarcher2acuk","title":"login.archer2.ac.uk","text":"
login.archer2.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBANu9BQJ1UFr4nwy8X5seIPgCnBl1TKc8XBq2YVY65qS53QcpzjZAH53/CtvyWkyGcmY8/PWsJo9sXHqzXVSkzk=\n\nlogin.archer2.ac.ukssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDFGGByIrskPayB5xRm3vkWoEc5bVtTCi0oTGslD8m+M1Sc/v2IV6FxaEVXGwO9ErQwrtFQRj0KameLS3Jn0LwQ13Tw+vTXV0bsKyGgEu2wW+BSDijGpbxRZXZrg30TltZXd4VkTuWiE6kyhJ6qiIIR0nwfDblijGy3u079gM5Om/Q2wydwh0iAASRzkqldL5bKDb14Vliy7tCT3TJXI49+qIagWUhNEzyN1j2oK/2n3JdflT4/anQ4jUywVG4D1Tor/evEeSa3h5++gbtgAXZaCtlQbBxwckmTetXqnlI+pvkF0AAuS18Bh+hdmvT1+xW0XLv7CMA64HfR93XgQIIuPqFAS1p+HuJkmk4xFAdwrzjnpYAiU5Apkq+vx3W957/LULzZkeiFQY2Y3CY9oPVR8WBmGKXOOBifhl2Hvd51fH1wd0Lw7Zph53NcVSQQhdDUVhgsPJA3M/+UlqoAMEB/V6ESE2z6yrXVfNjDNbbgA1K548EYpyNR8z4eRtZOoi0=\n\nlogin.archer2.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINyptPmidGmIBYHPcTwzgXknVPrMyHptwBgSbMcoZgh5\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting-totp/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to log in or transfer data to ARCHER2 can become tedious as it often has to be repeated several times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make the process more convenient.

Each remote site (or group of sites) can have an entry in this file, which may look something like:

Full system
Host archer2\n    HostName login.archer2.ac.uk\n    User username\n

(remember to replace username with your actual username!).

Taking the full-system example: the Host line defines a short name for the entry. In this case, instead of typing ssh username@login.archer2.ac.uk to access the ARCHER2 login nodes, you could use ssh archer2 instead. The remaining lines define the options for the host.

Now you can use SSH to access ARCHER2 without needing to enter your username or the full hostname every time:

ssh archer2\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config manual page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. For example, you may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Bug

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add that path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting-totp/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect to ARCHER2, there are some simple checks you may use to diagnose the issue, which are described below. If you are having difficulties connecting, we suggest trying these before contacting the ARCHER2 Service Desk.

"},{"location":"user-guide/connecting-totp/#use-the-userloginarcher2acuk-syntax-rather-than-l-user-loginarcher2acuk","title":"Use the user@login.archer2.ac.uk syntax rather than -l user login.archer2.ac.uk","text":"

We have seen a number of instances where people using the syntax

ssh -l user login.archer2.ac.uk\n

have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:

ssh user@login.archer2.ac.uk\n

works more reliably.

"},{"location":"user-guide/connecting-totp/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.archer2.ac.uk, on Linux or MacOS, or ping -n 3 login.archer2.ac.uk on Windows. If you successfully connect to the login node, the output should include:

--- login.archer2.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your Internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting-totp/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey), this may indicate a problem with your SSH key. Some things to check:

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_ARCHER2, use the command chmod 600 id_rsa_ARCHER2.

On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both the public and private key files, as well as the containing folder.

Tip

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions. For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting-totp/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting-totp/#ssh-verbose-output","title":"SSH verbose output","text":"

The verbose-debugging output from ssh can be very useful for diagnosing issues. In particular, it can be used to distinguish between problems with the SSH key and password. To enable verbose output, add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.archer2.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key_hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key_hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key_hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: password, keyboard-interactive\n

In the text above, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated succeeded indicates that the SSH key has been accepted. By default SSH will go through a list of standard private-key files, as well as any you have specified with -i or a config file. To succeed, one of these private keys needs to match to the public key uploaded to SAFE.

If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with Permission denied (publickey). If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.

You should next see something similiar to:

debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (keyboard-interactive).\n

If you do not see the Password: prompt you may have connection issues, or there could be a problem with the ARCHER2 login nodes. If you do not see Authenticated with partial success it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see Authenticated with partial success, it means your password was accepted, and your SSH key will now be checked.

The equivalent information can be obtained in PuTTY by enabling All Logging in settings.

"},{"location":"user-guide/connecting-totp/#related-software","title":"Related Software","text":""},{"location":"user-guide/connecting-totp/#tmux","title":"tmux","text":"

tmux is a multiplexer application available on the ARCHER2 login nodes. It allows for multiple sessions to be open concurrently and these sessions can be detached and run in the background. Furthermore, sessions will continue to run after a user logs off and can be reattached to upon logging in again. It is particularly useful if you are connecting to ARCHER2 on an unstable Internet connection or if you wish to keep an arrangement of terminal applications running while you disconnect your client from the Internet -- for example, when moving between your home and workplace.

"},{"location":"user-guide/connecting/","title":"Connecting to ARCHER2","text":"

This section covers the basic connection methods.

On the ARCHER2 system, interactive access is achieved using SSH, either directly from a command-line terminal or using an SSH client. In addition, data can be transferred to and from the ARCHER2 system using scp from the command line or by using a file-transfer client.

Before following the process below, we assume you have set up an account on ARCHER2 through the EPCC SAFE. Documentation on how to do this can be found at:

"},{"location":"user-guide/connecting/#command-line-terminal","title":"Command line terminal","text":""},{"location":"user-guide/connecting/#linux","title":"Linux","text":"

Linux distributions include a terminal application that can be used for SSH access to the ARCHER2 login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g., GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

"},{"location":"user-guide/connecting/#macos","title":"MacOS","text":"

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

"},{"location":"user-guide/connecting/#windows","title":"Windows","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend Windows users download and install MobaXterm to access ARCHER2. It is very easy to use and includes an integrated X Server, which allows you to run graphical applications on ARCHER2.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi) and follow the instructions from the Windows Installation Wizard. Note, you might need to have administrator rights to install on some versions of Windows. Also, make sure to check whether Windows Firewall has blocked any features of this program after installation (Windows will warn you if the built-in firewall blocks an action, and gives you the opportunity to override the behaviour).

Once installed, start MobaXterm and then click \"Start local terminal\".

Tips

"},{"location":"user-guide/connecting/#access-credentials","title":"Access credentials","text":"

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password. You can find more detailed instructions on how to set up your credentials to access ARCHER2 from Windows, MacOS and Linux below.

"},{"location":"user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access ARCHER2.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE.

Then:

  1. Go to the Menu Login accounts and select the ARCHER2 account you want to add the SSH key to.
  2. On the subsequent Login Account details page, click the Add Credential button.
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key with your account.

Once you have done this, your SSH key will be added to your ARCHER2 account.

"},{"location":"user-guide/connecting/#mfa-time-based-one-time-passcode-totp-code","title":"MFA Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode to log into ARCHER2 so you will also need to set up a method for generating a TOTP code before you can log into ARCHER2.

"},{"location":"user-guide/connecting/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to ARCHER2 after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to ARCHER2 with a new account. When you log into ARCHER2 for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into ARCHER2 from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"

As noted above, you interact with ARCHER2, over an encrypted communication channel (specifically, Secure Shell version 2 (SSH-2)). This allows command-line access to one of the login nodes of ARCHER2, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers, when used in conjunction with an X Server.

"},{"location":"user-guide/connecting/#logging-in","title":"Logging in","text":"

The login addresses for ARCHER2 are:

You can use the following command from the terminal window to log in to ARCHER2:

Full system
ssh username@login.archer2.ac.uk\n

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you logged into ARCHER2 with your account before the major upgrade in May/June 2023 you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Warning

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_ARCHER2 you would use the command ssh -i keys/id_rsa_ARCHER2 username@login.archer2.ac.uk to log in (or the equivalent for the 4-cabinet system).

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three-step process:

  1. When promoted to enter your ldap password: Re-enter the password you retrieved from SAFE.
  2. When prompted to enter your new password: type in a new password.
  3. When prompted to re-enter the new password: re-enter the new password.

Your password will now have been changed

To allow remote programs, especially graphical applications, to control your local display, such as for a debugger, use:

Full system
ssh -X username@login.archer2.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to ARCHER2. The host keys are checked against the login nodes when you login to ARCHER2 and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the ARCHER2 login nodes.

"},{"location":"user-guide/connecting/#loginarcher2acuk","title":"login.archer2.ac.uk","text":"
login.archer2.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBANu9BQJ1UFr4nwy8X5seIPgCnBl1TKc8XBq2YVY65qS53QcpzjZAH53/CtvyWkyGcmY8/PWsJo9sXHqzXVSkzk=\n\nlogin.archer2.ac.ukssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDFGGByIrskPayB5xRm3vkWoEc5bVtTCi0oTGslD8m+M1Sc/v2IV6FxaEVXGwO9ErQwrtFQRj0KameLS3Jn0LwQ13Tw+vTXV0bsKyGgEu2wW+BSDijGpbxRZXZrg30TltZXd4VkTuWiE6kyhJ6qiIIR0nwfDblijGy3u079gM5Om/Q2wydwh0iAASRzkqldL5bKDb14Vliy7tCT3TJXI49+qIagWUhNEzyN1j2oK/2n3JdflT4/anQ4jUywVG4D1Tor/evEeSa3h5++gbtgAXZaCtlQbBxwckmTetXqnlI+pvkF0AAuS18Bh+hdmvT1+xW0XLv7CMA64HfR93XgQIIuPqFAS1p+HuJkmk4xFAdwrzjnpYAiU5Apkq+vx3W957/LULzZkeiFQY2Y3CY9oPVR8WBmGKXOOBifhl2Hvd51fH1wd0Lw7Zph53NcVSQQhdDUVhgsPJA3M/+UlqoAMEB/V6ESE2z6yrXVfNjDNbbgA1K548EYpyNR8z4eRtZOoi0=\n\nlogin.archer2.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINyptPmidGmIBYHPcTwzgXknVPrMyHptwBgSbMcoZgh5\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to log in or transfer data to ARCHER2 can become tedious as it often has to be repeated several times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make the process more convenient.

Each remote site (or group of sites) can have an entry in this file, which may look something like:

Full system
Host archer2\n    HostName login.archer2.ac.uk\n    User username\n

(remember to replace username with your actual username!).

Taking the full-system example: the Host line defines a short name for the entry. In this case, instead of typing ssh username@login.archer2.ac.uk to access the ARCHER2 login nodes, you could use ssh archer2 instead. The remaining lines define the options for the host.

Now you can use SSH to access ARCHER2 without needing to enter your username or the full hostname every time:

ssh archer2\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config manual page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. For example, you may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Bug

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add that path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect to ARCHER2, there are some simple checks you may use to diagnose the issue, which are described below. If you are having difficulties connecting, we suggest trying these before contacting the ARCHER2 Service Desk.

"},{"location":"user-guide/connecting/#use-the-userloginarcher2acuk-syntax-rather-than-l-user-loginarcher2acuk","title":"Use the user@login.archer2.ac.uk syntax rather than -l user login.archer2.ac.uk","text":"

We have seen a number of instances where people using the syntax

ssh -l user login.archer2.ac.uk\n

have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:

ssh user@login.archer2.ac.uk\n

works more reliably.

"},{"location":"user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.archer2.ac.uk, on Linux or MacOS, or ping -n 3 login.archer2.ac.uk on Windows. If you successfully connect to the login node, the output should include:

--- login.archer2.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your Internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey), this may indicate a problem with your SSH key. Some things to check:

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_ARCHER2, use the command chmod 600 id_rsa_ARCHER2.

On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both the public and private key files, as well as the containing folder.

Tip

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions. For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"

The verbose-debugging output from ssh can be very useful for diagnosing issues. In particular, it can be used to distinguish between problems with the SSH key and password. To enable verbose output, add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.archer2.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key_hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key_hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key_hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: password, keyboard-interactive\n

In the text above, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated succeeded indicates that the SSH key has been accepted. By default SSH will go through a list of standard private-key files, as well as any you have specified with -i or a config file. To succeed, one of these private keys needs to match to the public key uploaded to SAFE.

If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with Permission denied (publickey). If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.

You should next see something similiar to:

debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (keyboard-interactive).\n

If you do not see the Password: prompt you may have connection issues, or there could be a problem with the ARCHER2 login nodes. If you do not see Authenticated with partial success it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see Authenticated with partial success, it means your password was accepted, and your SSH key will now be checked.

The equivalent information can be obtained in PuTTY by enabling All Logging in settings.

"},{"location":"user-guide/connecting/#related-software","title":"Related Software","text":""},{"location":"user-guide/connecting/#tmux","title":"tmux","text":"

tmux is a multiplexer application available on the ARCHER2 login nodes. It allows for multiple sessions to be open concurrently and these sessions can be detached and run in the background. Furthermore, sessions will continue to run after a user logs off and can be reattached to upon logging in again. It is particularly useful if you are connecting to ARCHER2 on an unstable Internet connection or if you wish to keep an arrangement of terminal applications running while you disconnect your client from the Internet -- for example, when moving between your home and workplace.

"},{"location":"user-guide/containers/","title":"Containers","text":"

This page was originally based on the documentation at the University of Sheffield HPC service

Designed around the notion of mobility of compute and reproducible science, Singularity enables users to have full control of their operating system environment. This means that a non-privileged user can \"swap out\" the Linux operating system and environment on the host for a Linux OS and environment that they control. So if the host system is running CentOS Linux but your application runs in Ubuntu Linux with a particular software stack, you can create an Ubuntu image, install your software into that image, copy the image to another host (e.g. ARCHER2), and run your application on that host in its native Ubuntu environment.

Singularity also allows you to leverage the resources of whatever host you are on. This includes high-speed interconnects (e.g. Slingshot on ARCHER2), file systems (e.g. /home and /work on ARCHER2) and potentially other resources.

Note

Singularity only supports Linux containers. You cannot create images that use Windows or macOS (this is a restriction of the containerisation model rather than Singularity).

"},{"location":"user-guide/containers/#useful-links","title":"Useful Links","text":""},{"location":"user-guide/containers/#about-singularity-containers-images","title":"About Singularity Containers (Images)","text":"

Similar to Docker, a Singularity container is a self-contained software stack. As Singularity does not require a root-level daemon to run its containers (as is required by Docker) it is suitable for use on multi-user HPC systems such as ARCHER2. Within the container, you have exactly the same permissions as you do in a standard login session on the system.

In practice, this means that a container image created on your local machine with all your research software installed for local development will also run on ARCHER2.

Pre-built container images (such as those on DockerHub or SingularityHub archive can simply be downloaded and used on ARCHER2 (or anywhere else Singularity is installed).

Creating and modifying container images requires root permission and so must be done on a system where you have such access (in practice, this is usually within a virtual machine on your laptop/workstation).

Note

SingularityHub was a publicly available cloud service for Singularity container images active from 2016 to 2021. It built container recipes from Github repositories on Google Cloud, and container images were available via the command line Singularity or sregistry software. These container images are still available now in the SingularityHub Archive

"},{"location":"user-guide/containers/#using-singularity-images-on-archer2","title":"Using Singularity Images on ARCHER2","text":"

Singularity containers can be used on ARCHER2 in a number of ways, including:

We provide information on each of these scenarios below. First, we describe briefly how to get existing container images onto ARCHER2 so that you can launch containers based on them.

"},{"location":"user-guide/containers/#getting-existing-container-images-onto-archer2","title":"Getting existing container images onto ARCHER2","text":"

Singularity container images are files, so, if you already have a container image, you can use scp to copy the file to ARCHER2 as you would with any other file.

If you wish to get a file from one of the container image repositories, then Singularity allows you to do this from ARCHER2 itself.

For example, to retrieve a container image from SingularityHub on ARCHER2 we can simply issue a Singularity command to pull the image.

auser@ln03:~> singularity pull hello-world.sif shub://vsoch/hello-world\n

The container image located at the shub URI is written to a Singularity Image File (SIF) called hello-world.sif.

"},{"location":"user-guide/containers/#interactive-use-on-the-login-nodes","title":"Interactive use on the login nodes","text":"

Once you have a container image file, launching a container based on the container image on the login nodes in an interactive way is extremely simple: you use the singularity shell command. Using the container image we built in the example above:

auser@ln03:~> singularity shell hello-world.sif\nSingularity>\n

Within a Singularity container your home directory will be available.

Once you have finished using your container, you can return to the ARCHER2 login node prompt with the exit command:

Singularity> exit\nexit\nauser@ln03:~>\n
"},{"location":"user-guide/containers/#interactive-use-on-the-compute-nodes","title":"Interactive use on the compute nodes","text":"

The process for using a container interactively on the compute nodes is very similar to that for the login nodes. The only difference is that you first have to submit an interactive serial job (from a location on /work) in order to get interactive access to the compute node.

For example, to reserve a full node for you to work on interactively you would use:

auser@ln03:/work/t01/t01/auser> srun --nodes=1 --exclusive --time=00:20:00 \\\n                                      --account=[budget code] \\\n                                      --partition=standard --qos=standard \\\n                                      --pty /bin/bash\n\n...wait until job starts...\n\nauser@nid00001:/work/t01/t01/auser>\n

Note that the prompt has changed to show you are on a compute node. Now you can launch a container in the same way as on the login node.

auser@nid00001:/work/t01/t01/auser> singularity shell hello-world.sif\nSingularity> exit\nexit\nauser@nid00001:/work/t01/t01/auser> exit\nauser@ln03:/work/t01/t01/auser>\n

Note

We used exit to leave the interactive container shell and then exit again to leave the interactive job on the compute node.

"},{"location":"user-guide/containers/#serial-processes-within-a-non-interactive-batch-script","title":"Serial processes within a non-interactive batch script","text":"

You can also use Singularity containers within a non-interactive batch script as you would any other command. If your container image contains a runscript then you can use singularity run to execute the runscript in the job. You can also use singularity exec to execute arbitrary commands (or scripts) within the container.

An example job submission script to run a serial job that executes the runscript within a container based on the container image in the hello-world.sif file that we downloaded previously to an ARCHER2 login node would be as follows.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n\n#SBATCH --job-name=helloworld\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:10:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Run the serial executable\nsingularity run $SLURM_SUBMIT_DIR/hello-world.sif\n

You submit this in the usual way and the standard output and error should be written to slurm-..., where the output filename ends with the job number.

"},{"location":"user-guide/containers/#parallel-processes-within-a-non-interactive-batch-script","title":"Parallel processes within a non-interactive batch script","text":"

Running a Singularity container in parallel across a number of compute nodes requires some preparation. In general though, Singularity can be run within the parallel job launcher (srun).

srun <options> \\\n    singularity <options> /path/to/image/file \\\n        app <options>\n

The code snippet above shows the launch command as having three nested parts, srun, the singularity environment and the containerised application.

The Singularity container image must be compatible with the MPI environment on the host; either, the containerised app has been built against the appropriate MPI libraries or the container itself contains an MPI library that is compatible with the host MPI. The latter situation is known as the hybrid model; this is the approach taken in the sections that follow.

"},{"location":"user-guide/containers/#creating-your-own-singularity-container-images","title":"Creating Your Own Singularity Container Images","text":"

As we saw above, you can create Singularity container images by importing from DockerHub or Singularity Hub on ARCHER2 itself. If you wish to create your own custom container image to use with Singularity then you must use a system where you have root (or administrator) privileges - often your own laptop or workstation.

There are a number of different options to create container images on your local system to use with Singularity on ARCHER2. We are going to use Docker on our local system to create the container image, push the new container image to Docker Hub and then use Singularity on ARCHER2 to convert the Docker container image to a Singularity container image SIF file.

For macOS and Windows users we recommend installing Docker Desktop. For Linux users, we recommend installing Docker directly on your local system. See the Docker documentation for full details on how to install Docker Desktop/Docker.

"},{"location":"user-guide/containers/#building-container-images-using-docker","title":"Building container images using Docker","text":"

Note

We assume that you are familiar with using Docker in these instructions. You can find an introduction to Docker at Reproducible Computational Environments Using Containers: Introduction to Docker

As usual, you can build container images with a command similar to:

docker build --platform linux/amd64 -t <username>/<image name>:<version> .\n

Where:

Note, you should use the --platform linux/amd64 option to ensure that the container image is compatible with the processor architecture on ARCHER2.

"},{"location":"user-guide/containers/#using-singularity-with-mpi-on-archer2","title":"Using Singularity with MPI on ARCHER2","text":"

MPI on ARCHER2 is provided by the Cray MPICH libraries with the interface to the high-performance Slingshot interconnect provided via the OFI interface. Therefore, as per the Singularity MPI Hybrid model, we will build our container image such that it contains a version of the MPICH MPI library compiled with support for OFI. Below, we provide instructions on creating a container image with a version of MPICH compiled in this way. We then provide an example of how to run a Singularity container with MPI over multiple ARCHER2 compute nodes.

"},{"location":"user-guide/containers/#building-an-image-with-mpi-from-scratch","title":"Building an image with MPI from scratch","text":"

Warning

Remember, all these steps should be executed on your local system where you have administrator privileges and Docker installed, not on ARCHER2.

We will illustrate the process of building a Singularity image with MPI from scratch by building an image that contains MPI provided by MPICH and the OSU MPI benchmarks. As part of the container image creation we need to download the source code for both MPICH and the OSU benchmarks. At the time of writing, the stable MPICH release is 3.4.2 and the stable OSU benchmark release is 5.8 - this may have changed by the time you are following these instructions.

First, create a Dockerfile that describes how to build the image:

FROM ubuntu:20.04\n\nENV DEBIAN_FRONTEND=noninteractive\n\n# Install the necessary packages (from repo)\nRUN apt-get update && apt-get install -y --no-install-recommends \\\n apt-utils \\\n build-essential \\\n curl \\\n libcurl4-openssl-dev \\\n libzmq3-dev \\\n pkg-config \\\n software-properties-common\nRUN apt-get clean\nRUN apt-get install -y dkms\nRUN apt-get install -y autoconf automake build-essential numactl libnuma-dev autoconf automake gcc g++ git libtool\n\n# Download and build an ABI compatible MPICH\nRUN curl -sSLO http://www.mpich.org/static/downloads/3.4.2/mpich-3.4.2.tar.gz \\\n   && tar -xzf mpich-3.4.2.tar.gz -C /root \\\n   && cd /root/mpich-3.4.2 \\\n   && ./configure --prefix=/usr --with-device=ch4:ofi --disable-fortran \\\n   && make -j8 install \\\n   && rm -rf /root/mpich-3.4.2 \\\n   && rm /mpich-3.4.2.tar.gz\n\n# OSU benchmarks\nRUN curl -sSLO http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.4.1.tar.gz \\\n   && tar -xzf osu-micro-benchmarks-5.4.1.tar.gz -C /root \\\n   && cd /root/osu-micro-benchmarks-5.4.1 \\\n   && ./configure --prefix=/usr/local CC=/usr/bin/mpicc CXX=/usr/bin/mpicxx \\\n   && cd mpi \\\n   && make -j8 install \\\n   && rm -rf /root/osu-micro-benchmarks-5.4.1 \\\n   && rm /osu-micro-benchmarks-5.4.1.tar.gz\n\n# Add the OSU benchmark executables to the PATH\nENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:$PATH\nENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/collective:$PATH\n\n# path to mlx libraries in Ubuntu\nENV LD_LIBRARY_PATH=/usr/lib/libibverbs:$LD_LIBRARY_PATH\n

A quick overview of what the above Dockerfile is doing:

Now we can go ahead and build the container image using Docker (this assumes that you issue the command in the same directory as the Dockerfile you created based on the specification above):

docker build --platform linux/amd64 -t auser/osu-benchmarks:5.4.1 .\n

(Remember to change auser to your Dockerhub username.)

Once you have successfully built your container image, you should push it to Dockerhub:

docker push auser/osu-benchmarks:5.4.1\n

Finally, you need to use Singularity on ARCHER2 to convert the Docker container image to a Singularity container image file. Log into ARCHER2, move to the work file system and then use a command like:

auser@ln01:/work/t01/t01/auser> singularity build osu-benchmarks_5.4.1.sif docker://auser/osu-benchmarks:5.4.1\n

Tip

You can find a copy of the osu-benchmarks_5.4.1.sif image on ARCHER2 in the directory $EPCC_SINGULARITY_DIR if you do not want to build it yourself but still want to test.

"},{"location":"user-guide/containers/#running-parallel-mpi-jobs-using-singularity-containers","title":"Running parallel MPI jobs using Singularity containers","text":"

Tip

These instructions assume you have built a Singularity container image file on ARCHER2 that includes MPI provided by MPICH with the OFI interface. See the sections above for how to build such container images.

Once you have built your Singularity container image file that includes MPICH built with OFI for ARCHER2, you can use it to run parallel jobs in a similar way to non-Singularity jobs. The example job submission script below uses the container image file we built above with MPICH and the OSU benchmarks to run the Allreduce benchmark on two nodes where all 128 cores on each node are used for MPI processes (so, 256 MPI processes in total).

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=singularity_parallel\n#SBATCH --time=0:10:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --account=[budget code]\n\n# Load the module to make the Cray MPICH ABI available\nmodule load cray-mpich-abi\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n#\u00a0Set the LD_LIBRARY_PATH environment variable within the Singularity container\n# to ensure that it used the correct MPI libraries.\nexport SINGULARITYENV_LD_LIBRARY_PATH=\"/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib:/opt/cray/libfabric/1.12.1.2.2.0.0/lib64:/opt/cray/pe/gcc-libs:/opt/cray/pe/gcc-libs:/opt/cray/pe/lib64:/opt/cray/pe/lib64:/opt/cray/xpmem/default/lib64:/usr/lib64/libibverbs:/usr/lib64:/usr/lib64\"\n\n# This makes sure HPE Cray Slingshot interconnect libraries are available\n# from inside the container.\nexport SINGULARITY_BIND=\"/opt/cray,/var/spool,/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib,/etc/host.conf,/etc/libibverbs.d/mlx5.driver,/etc/libnl/classid,/etc/resolv.conf,/opt/cray/libfabric/1.12.1.2.2.0.0/lib64/libfabric.so.1,/opt/cray/pe/gcc-libs/libatomic.so.1,/opt/cray/pe/gcc-libs/libgcc_s.so.1,/opt/cray/pe/gcc-libs/libgfortran.so.5,/opt/cray/pe/gcc-libs/libquadmath.so.0,/opt/cray/pe/lib64/libpals.so.0,/opt/cray/pe/lib64/libpmi2.so.0,/opt/cray/pe/lib64/libpmi.so.0,/opt/cray/xpmem/default/lib64/libxpmem.so.0,/run/munge/munge.socket.2,/usr/lib64/libibverbs/libmlx5-rdmav34.so,/usr/lib64/libibverbs.so.1,/usr/lib64/libkeyutils.so.1,/usr/lib64/liblnetconfig.so.4,/usr/lib64/liblustreapi.so,/usr/lib64/libmunge.so.2,/usr/lib64/libnl-3.so.200,/usr/lib64/libnl-genl-3.so.200,/usr/lib64/libnl-route-3.so.200,/usr/lib64/librdmacm.so.1,/usr/lib64/libyaml-0.so.2\"\n\n# Launch the parallel job.\nsrun --hint=nomultithread --distribution=block:block \\\n    singularity run osu-benchmarks_5.4.1.sif \\\n        osu_allreduce\n

The only changes from a standard submission script are:

Important

Remember that the image file must be located on /work to run jobs on the compute nodes.

If the job runs correctly, you should see output similar to the following in your slurm-*.out file:

Lmod is automatically replacing \"cray-mpich/8.1.23\" with\n\"cray-mpich-abi/8.1.23\".\n\n\n# OSU MPI Allreduce Latency Test v5.4.1\n# Size       Avg Latency(us)\n4                       7.93\n8                       7.93\n16                      8.13\n32                      8.69\n64                      9.54\n128                    13.75\n256                    17.04\n512                    25.94\n1024                   29.43\n2048                   43.53\n4096                   46.53\n8192                   46.20\n16384                  55.85\n32768                  83.11\n65536                 136.90\n131072                257.13\n262144                486.50\n524288               1025.87\n1048576              2173.25\n
"},{"location":"user-guide/containers/#using-containerised-hpe-cray-programming-environments","title":"Using Containerised HPE Cray Programming Environments","text":"

An experimental containerised CPE module has been setup on ARCHER2. The module is not available by default but can be made accessible by running module use with the right path.

module use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n

The purpose of the ccpe module(s) is to allow developers to check that their code compiles with the latest Cray Programming Environment (CPE) releases. The CPE release installed on ARCHER2 (currently CPE 22.12) will typically be older than the latest available. A more recent containerised CPE therefore gives developers the opportunity to try out the latest compilers and libraries before the ARCHER CPE is upgraded.

Note

The Containerised CPEs support CCE and GCC compilers, but not AOCC compilers.

The ccpe/23.12 module then provides access to CPE 23.12 via a Singularity image file, located at /work/y07/shared/utils/dev/ccpe/23.12/cpe_23.12.sif. Singularity containers can be run such that locations on the host file system are still visible. This means source code stored on /work can be compiled from inside the CPE container. And any output resulting from the compilation, such as object files, libraries and executables, can be written to /work also. This ability to bind to locations on the host is necessary as the container is immutable, i.e., you cannot write files to the container itself.

Any executable resulting from a containerised CPE build can be run from within the container, allowing the developer to test the performance of the containerised libraries, e.g., libmpi_cray, libpmi2, libfabric.

We'll now show how to build and run a simple Hello World MPI example using a containerised CPE.

First, cd to the directory containing the Hello World MPI source, makefile and build script. Examples of these files are given below.

build.shmakefilehelloworld.f90
#!/bin/bash\n\nmake clean\nmake\n\necho -e \"\\n\\nldd helloworld\"\nldd helloworld\n
MF=     Makefile\n\nFC=     ftn\nFFLAGS= -O3\nLFLAGS= -lmpichf90\n\nEXE=    helloworld\nFSRC=   helloworld.f90\n\n#\n# No need to edit below this line\n#\n\n.SUFFIXES:\n.SUFFIXES: .f90 .o\n\nOBJ=    $(FSRC:.f90=.o)\n\n.f90.o:\n    $(FC) $(FFLAGS) -c $<\n\nall:    $(EXE)\n\n$(EXE): $(OBJ)\n    $(FC) $(FFLAGS) -o $@ $(OBJ) $(LFLAGS)\n\nclean:\n    rm -f $(OBJ) $(EXE) core\n
!\n! Prints 'Hello World' from rank 0 and\n! prints what processor it is out of the total number of processors from\n! all ranks\n!\n\nprogram helloworld\n  use mpi\n\n  implicit none\n\n  integer :: comm, rank, size, ierr\n  integer :: last_arg\n\n  comm = MPI_COMM_WORLD\n\n  call MPI_INIT(ierr)\n\n  call MPI_COMM_RANK(comm, rank, ierr)\n  call MPI_COMM_SIZE(comm, size, ierr)\n\n  ! Each process prints out its rank\n  write(*,*) 'I am ', rank, 'out of ', size,' processors.'\n\n  call sleep(1)\n\n  call MPI_FINALIZE(ierr)\n\nend program helloworld\n

The ldd command at the end of the build script is simply there to confirm that the code is indeed linked to containerised libraries that form part of the CPE 23.12 release.

The next step is to launch a job (via sbatch) on a serial node that instantiates the containerised CPE 23.12 image and builds the Hello World MPI code.

submit-build.slurm
#!/bin/bash\n\n#SBATCH --job-name=ccpe-build\n#SBATCH --ntasks=8\n#SBATCH --time=00:10:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nBUILD_CMD=\"${CCPE_BUILDER} ${SLURM_SUBMIT_DIR}/build.sh\"\n\nsingularity exec --cleanenv \\\n    --bind ${CCPE_BIND_ARGS},${SLURM_SUBMIT_DIR} --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n    ${CCPE_IMAGE_FILE} ${BUILD_CMD}\n

The CCPE environment variables shown above (e.g., CCPE_BUILDER and CCPE_IMAGE_FILE) are set by the loading of the ccpe/23.12 module. The CCPE_BUILDER variable holds the path to the script that prepares the containerised environment prior to running the build.sh script. You can run cat ${CCPE_BUILDER} to take a closer look at what is going on.

Note

Passing the ${SLURM_SUBMIT_DIR} path to Singularity via the --bind option allows the CPE container to access the source code and write out the executable using locations on the host.

Running the newly-built code is similarly straightforward; this time the containerised CPE is launched on the compute nodes using the srun command.

submit-run.slurm
#!/bin/bash\n\n#SBATCH --job-name=helloworld\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nRUN_CMD=\"${SLURM_SUBMIT_DIR}/helloworld\"\n\nsrun --distribution=block:block --hint=nomultithread --chdir=${SLURM_SUBMIT_DIR} \\\n    singularity exec --bind ${CCPE_BIND_ARGS},${SLURM_SUBMIT_DIR} --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n        ${CCPE_IMAGE_FILE} ${RUN_CMD}\n

If you wish you can at runtime replace a containerised library with its host equivalent. You may for example decide to do this for a low-level communications library such as libfabric or libpmi. This can be done by adding (before the srun command) something like the following line to the submit-run.slurm file.

source ${CCPE_SET_HOST_PATH} \"/opt/cray/pe/pmi\" \"6.1.8\" \"lib\"\n

As of April 2024, the version of PMI available on ARCHER2 is 6.1.8 (CPE 22.12), and so the command above would allow you to isolate the impact of the containerised PMI library, which for CPE 23.12 is PMI 6.1.13. To see how the setting of the host library is done, simply run cat ${CCPE_SET_HOST_PATH} after loading the ccpe module.

An MPI code that just prints a message from each rank is obviously very simple. Real-world codes such as CP2K or GROMACS will often require additional software for compilation, e.g., Intel MKL libraries or tools that control the build process such as CMake. The way round this sort of problem is to point the CCPE container at the locations on the host where the software is installed.

submit-cmake-build.slurm
#!/bin/bash\n\n#SBATCH --job-name=ccpe-build\n#SBATCH --ntasks=8\n#SBATCH --time=00:10:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nCMAKE_DIR=\"/work/y07/shared/utils/core/cmake/3.21.3\"\n\nBUILD_CMD=\"${CCPE_BUILDER} ${SLURM_SUBMIT_DIR}/build.sh\"\n\nsingularity exec --cleanenv \\\n    --bind ${CCPE_BIND_ARGS},${CMAKE_DIR},${SLURM_SUBMIT_DIR} \\\n    --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n    ${CCPE_IMAGE_FILE} ${BUILD_CMD}\n

The submit-cmake-build.slurm script shows how the --bind option can be used to make the CMake installation on ARCHER2 accessible from within the container. The build.sh script can then call the cmake command directly (once the CMake bin directory has been added to the PATH environment variable).

"},{"location":"user-guide/data-migration/","title":"Data migration from ARCHER to ARCHER2","text":"

This content has been moved to archer-migration/data-migration

"},{"location":"user-guide/data/","title":"Data management and transfer","text":"

This section covers best practice and tools for data management on ARCHER2 along with a description of the different storage available on the service.

The IO section has information on achieving good performance for reading and writing data to the ARCHER2 storage along with information and advice on different IO patterns.

Information

If you have any questions on data management and transfer please do not hesitate to contact the ARCHER2 service desk at support@archer2.ac.uk.

"},{"location":"user-guide/data/#useful-resources-and-links","title":"Useful resources and links","text":""},{"location":"user-guide/data/#data-management","title":"Data management","text":"

We strongly recommend that you give some thought to how you use the various data storage facilities that are part of the ARCHER2 service. This will not only allow you to use the machine more effectively but also to ensure that your valuable data is protected.

Here are the main points you should consider:

"},{"location":"user-guide/data/#archer2-storage","title":"ARCHER2 storage","text":"

The ARCHER2 service, like many HPC systems, has a complex structure. There are a number of different data storage types available to users:

Each type of storage has different characteristics and policies, and is suitable for different types of use.

Important

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

There are also three different types of node available to users:

Each type of node sees a different combination of the storage types. The following table shows which storage options are avalable on different node types:

Storage Login Nodes Compute Nodes Data analysis nodes Notes /home yes no yes Incremental backup /work yes yes yes No backup, high performance Solid state (NVMe) yes yes yes No backup, high performance RDFaaS yes no yes Disaster recovery backup

Important

Only the work file systems and the solid state (NVMe) file system are visible on the compute nodes. This means that all data required by calculations at runtime (input data, application binaries, software libraries, etc.) must be placed on one of these file systems.

You may see \"file not found\" errors if you try to access data on the /home or RDFaaS file systems when running on the compute nodes.

"},{"location":"user-guide/data/#home-file-systems","title":"Home file systems","text":"

There are four independent home file-systems. Every project has an allocation on one of the four. You do not need to know which one your project uses as your projects space can always be accessed via the path /home/[project ID] with your personal directory at /home/[project ID]/[project ID]/[user ID]. Each home file-system is approximately 100 TB in size and is implemented using standard Network Attached Storage (NAS) technology. This means that these disks are not particularly high performance but are well suited to standard operations like compilation and file editing. These file systems are visible from the ARCHER2 login nodes.

"},{"location":"user-guide/data/#accessing-snapshots-of-home-file-systems","title":"Accessing snapshots of home file systems","text":"

The home file systems are fully backed up. The home file systems retain snapshots which can be used to recover past versions of files. Snapshots are taken weekly (for each of the past two weeks), daily (for each of the past two days) and hourly (for each of the last 6 hours). You can access the snapshots at .snapshot from any given directory on the home file systems. Note that the .snapshot directory will not show up under any version of \u201cls\u201d and will not tab complete.

These file systems are a good location to keep source code, copies of scripts and compiled binaries. Small amounts of important data can also be copied here for safe keeping though the file systems are not fast enough to manipulate large datasets effectively.

"},{"location":"user-guide/data/#quotas-on-home-file-systems","title":"Quotas on home file systems","text":"

All projects are assigned a quota on the home file systems. The project PI or manager can split this quota up between users or groups of users if they wish.

You can view any home file system quotas that apply to your account by logging into SAFE and navigating to the page for your ARCHER2 login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your ARCHER2 login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Tip

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the systems themselves.

"},{"location":"user-guide/data/#work-file-systems","title":"Work file systems","text":"

There are currently three work file systems on the full ARCHER2 service. Each of these file systems is 3.4 PB and a portion of one of these file systems is available to each project. You do not usually need to know which one your project uses as your projects space can always be accessed via the path /work/[project ID] with your personal directory at /work/[project ID]/[project ID]/[user ID].

All of these are high-performance, Lustre parallel file systems. They are designed to support data in large files. The performance for data stored in large numbers of small files is probably not going to be as good.

These file systems are available on the compute nodes and are the default location users should use for data required at runtime on the compute nodes.

Warning

There are no backups of any data on the work file systems. You should not rely on these file systems for long term storage.

Ideally, these file systems should only contain data that is:

In practice it may be convenient to keep copies of datasets on the work file systems that you know will be needed at a later date. However, make sure that important data is always backed up elsewhere and that your work would not be significantly impacted if the data on the work file systems was lost.

Large data sets can be moved to the RDFaaS storage or transferred off the ARCHER2 service entirely.

If you have data on the work file systems that you are not going to need in the future please delete it.

"},{"location":"user-guide/data/#quotas-on-the-work-file-systems","title":"Quotas on the work file systems","text":"

As for the home file systems, all projects are assigned a quota on the work file systems. The project PI or manager can split this quota up between users or groups of users if they wish.

You can view any work file system quotas that apply to your account by logging into SAFE and navigating to the page for your ARCHER2 login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your ARCHER2 login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Tip

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the systems themselves.

You can also examine up to date quotas and usage on the ARCHER2 systems themselves using the lfs quota command. To do this:

cd /work/t01/t01/auser\n
auser@ln03:/work/t01/t01/auser> lfs quota -hu auser .\nDisk quotas for usr auser (uid 5496):\n  Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  1.366G      0k      0k       -    5486       0       0       -\nuid 5496 is using default block quota setting\nuid 5496 is using default file quota setting\n

the quota and limit of 0k here indicate that no user quota is set for this user

auser@ln03:/work/t01/t01/auser> lfs quota -hp $(id -g) .\nDisk quotas for prj 1009 (pid 1009):\n  Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  2.905G      0k      0k       -   25300       0       0       -\npid 1009 is using default block quota setting\npid 1009 is using default file quota setting\n
"},{"location":"user-guide/data/#solid-state-nvme-file-system-scratch-storage","title":"Solid state (NVMe) file system - scratch storage","text":"

Important

The solid state storage system is configured as scratch storage with all files that have not been accessed in the last 28 days being automatically deleted. This implementation starts on 28 Feb 2024, i.e. any files not accessed since 1 Feb 2024 will be automatically removed on 28 Feb 2024.

The solid state storage file system is a 1 PB high performance parallel Lustre file system similar to the work file systems. However, unlike the work file systems, all of the disks are based solid state storage (NVMe) technology. This changes the performance characteristics of the file system compared to the work file systems. Testing by the ARCHER2 CSE team at EPCC has shown that you may see I/O performance improvements from the solid state storage compared to the standard work Lustre file systems on ARCHER2 if your I/O model has the following characteristics or similar:

Data on the solid state (NVMe) file system is visible on the compute nodes

Important

If you use MPI-IO approaches to reading/writing data - this includes parallel HDF5 and parallel NetCDF - then you very unlikely to see any performance improvements from using the solid state storage over the standard parallel Lustre file systems on ARCHER2.

Warning

There are no backups of any data on the solid state (NVMe) file system. You should not rely on this file system for long term storage.

"},{"location":"user-guide/data/#access-to-the-solid-state-file-system","title":"Access to the solid state file system","text":"

Projects do not have access to the solid state file system by default. If your project does not yet have access and you want access for your project, please contact the Service Desk to request access.

"},{"location":"user-guide/data/#location-of-directories","title":"Location of directories","text":"

You can find your directory on the file system at:

/mnt/lustre/a2fs-nvme/work/<project code>/<project code>/<username>\n

For example, if my username is auser and I am in project t01, I could find my solid state storage directory at:

/mnt/lustre/a2fs-nvme/work/t01/t01/auser\n
"},{"location":"user-guide/data/#quotas-on-solid-state-file-system","title":"Quotas on solid state file system","text":"

Important

All projects have the same, large quota of 250,000 GiB on the solid state file system to allow them to use it as a scratch file system. Remember, any files that have not been accessed in the last 28 days will be automatically deleted.

You query quotas for the solid state file system in the same way as quotas on the work file systems.

Bug

Usage and quotas of the solid state file system are not yet available in SAFE - you should use commands such as lfs quota -hp $(id -g) . to query quotas on the solid state file system.

"},{"location":"user-guide/data/#identifying-files-that-are-candidates-for-deletion","title":"Identifying files that are candidates for deletion","text":"

You can identify which files you own that are candidates for deletion at the next scratch file system purge using the find command in the following format:

find /mnt/lustre/a2fs-nvme/work/<project code> -atime +28 -type f -print\n

For example, if my account is in project t01, I would use:

find /mnt/lustre/a2fs-nvme/work/t01 -atime +28 -type f -print\n
"},{"location":"user-guide/data/#rdfaas-file-systems","title":"RDFaaS file systems","text":"

The RDFaaS file systems provide additional capacity for projects to store data that is not currently required on the compute nodes but which is too large for the Home file systems.

Warning

The RDFaaS file systems are backed up for disaster recovery purposes only (e.g. loss of the whole file system) so it is not possible to recover individual files if they are deleted by mistake or otherwise lost.

Tip

Not all projects on ARCHER2 have access to RDFaaS, if you do have access, this will show up in the login account page on SAFE for your ARCHER2 login account.

If you have access to RDFaaS, you will have a directory in one of two file systems: either /epsrc or /general.

For example, if your username is auser and you are in the e05 project, then your RDFaaS directory will be at:

/epsrc/e05/e05/auser\n

The RDFaaS file systems are not available on the ARCHER2 compute nodes.

Tip

If you are having issues accessing data on the RDFaaS file system then please contact the ARCHER2 Service Desk

"},{"location":"user-guide/data/#copying-data-from-rdfaas-to-work-file-systems","title":"Copying data from RDFaaS to Work file systems","text":"

You should use the standard Linux cp command to copy data from the RDFaaS file system to other ARCHER2 file systems (usually /work). For example, to transfer the file important-data.tar.gz from the RDFaaS file system to /work you would use the following command (assuming you are user auser in project e05):

cp /epsrc/e05/e05/auser/important-data.tar.gz /work/e05/e05/auser/\n

(remember to replace the project code and username with your own username and project code. You may also need to use /general if your data was there on the RDF file systems).

"},{"location":"user-guide/data/#subprojects","title":"Subprojects","text":"

Some large projects may choose to split their resources into multiple subprojects. These subprojects will have identifiers appended to the main project ID. For example, the rse subgroup of the z19 project would have the ID z19-rse. If the main project has allocated storage quotas to the subproject the directories for this storage will be found at, for example:

/home/z19/z19-rse/auser\n

Your Linux home directory will generally not be changed when you are made a member of a subproject so you must change directories manually (or change the ownership of files) to make use of this different storage quota allocation.

"},{"location":"user-guide/data/#sharing-data-with-other-archer2-users","title":"Sharing data with other ARCHER2 users","text":"

How you share data with other ARCHER2 users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.

"},{"location":"user-guide/data/#sharing-data-with-archer2-users-in-your-project","title":"Sharing data with ARCHER2 users in your project","text":"

Each project has an inner shared folder.

/work/[project code]/[project code]/shared\n

This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder would be located at /work/x01/x01/shared.

"},{"location":"user-guide/data/#sharing-data-with-archer2-users-within-the-same-project-group","title":"Sharing data with ARCHER2 users within the same project group","text":"

Some projects have subprojects (also often referred to as a 'project groups' or sub-budgets) e.g. project e123 might have a project group e123-fred for a sub-group of researchers working with Fred.

Often project groups do not have a disk quota set, but if the project PI does set up a group disk quota e.g. for /work then additional directories are created:

/work/e123/e123-fred\n/work/e123/e123-fred/shared\n/work/e123/e123-fred/<user> (for every user in the group)\n

and all members of the /work/e123/e123-fred group will be able to use the /work/e123/e123-fred/shared directory to share their files.

Note

If files are copied from their usual directories they will keep the original ownership. To grant ownership to the group:

chown -R $USER:e123-fred /work/e123/e123-fred/ ...

"},{"location":"user-guide/data/#sharing-data-with-all-archer2-users","title":"Sharing data with all ARCHER2 users","text":"

Each project also has an outer shared folder.:

/work/[project code]/shared\n

It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other ARCHER2 users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder would be located at /work/x01/shared.

"},{"location":"user-guide/data/#permissions","title":"Permissions","text":"

You should check the permissions of any files that you place in the shared area, especially if those files were created in your own ARCHER2 account. Files of the latter type are likely to be readable by you only.

The chmod command below shows how to make sure that a file placed in the outer shared folder is also readable by all ARCHER2 users.

chmod a+r /work/x01/shared/your-shared-file.txt\n

Similarly, for the inner shared folder, chmod can be called such that read permission is granted to all users within the x01 project.

chmod g+r /work/x01/x01/shared/your-shared-file.txt\n

If you're sharing a set of files stored within a folder hierarchy the chmod is slightly more complicated.

chmod -R a+Xr /work/x01/shared/my-shared-folder\nchmod -R g+Xr /work/x01/x01/shared/my-shared-folder\n

The -R option ensures that the read permission is enabled recursively and the +X guarantees that the user(s) you're sharing the folder with can access the subdirectories below my-shared-folder.

"},{"location":"user-guide/data/#sharing-data-between-projects-and-subprojects","title":"Sharing data between projects and subprojects","text":"

Every file has an owner group that specifies access permissions for users belonging to that group. It's usually the case that the group id is synonymous with the project code. Somewhat confusingly however, projects can contain groups of their own, called subprojects, which can be assigned disk space quotas distinct from the project.

chown -R $USER:x01-subproject /work/x01/x01-subproject/$USER/my-folder\n

The chown command above changes the owning group for all the files within my-folder to the x01-subproject group. This might be necessary if previously those files were owned by the x01 group and thereby using some of the x01 disk quota.

"},{"location":"user-guide/data/#archiving-and-data-transfer","title":"Archiving and data transfer","text":"

Data transfer speed may be limited by many different factors so the best data transfer mechanism to use depends on the type of data being transferred and where the data is going.

The method you use to transfer data to/from ARCHER2 will depend on how much you want to transfer and where to. The methods we cover in this guide are:

Before discussing specific data transfer methods, we cover archiving which is an essential process for transferring data efficiently.

"},{"location":"user-guide/data/#archiving","title":"Archiving","text":"

If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like tar and zip.

"},{"location":"user-guide/data/#tar","title":"tar","text":"

The tar command packs files into a \"tape archive\" format. The command has general form:

tar [options] [file(s)]\n

Common options include:

Putting these together:

tar -cvWlf mydata.tar mydata\n

will create and verify an archive.

To extract files from a tar file, the option -x is used. For example:

tar -b 2048 -xf mydata.tar\n

will recover the contents of mydata.tar to the current working directory (using a block size of 1 MiB to improve Lustre performance and reduce contention).

To verify an existing tar file against a set of data, the -d (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:

$> tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n

Note

tar files do not store checksums with their data, requiring the original data to be present during verification.

Tip

Further information on using tar can be found in the tar manual (accessed via man tar or at man tar).

"},{"location":"user-guide/data/#zip","title":"zip","text":"

The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:

zip [options] mydata.zip [file(s)]\n

Common options are:

Together:

zip -0r mydata.zip mydata\n

will create an archive.

Note

Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.

The corresponding unzip command is used to extract data from the archive. The simplest use case is:

unzip mydata.zip\n

which recovers the contents of the archive to the current working directory.

Files in a zip archive are stored with a CRC checksum to help detect data loss. unzip provides options for verifying this checksum against the stored files. The relevant flag is -t and is used as follows:

$> unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n

Tip

Further information on using zip can be found in the zip manual (accessed via man zip or at man zip).

"},{"location":"user-guide/data/#data-transfer-via-ssh","title":"Data transfer via SSH","text":"

The easiest way of transferring data to/from ARCHER2 is to use one of the standard programs based on the SSH protocol such as scp, sftp or rsync. These all use the same underlying mechanism (SSH) as you normally use to log-in to ARCHER2. So, once the the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine (ARCHER2 in this case).

To avoid having to type in your password multiple times you can set up a SSH key pair and use an SSH agent as documented in the User Guide at connecting.

"},{"location":"user-guide/data/#ssh-data-transfer-performance-considerations","title":"SSH data transfer performance considerations","text":"

The SSH protocol encrypts all traffic it sends. This means that file transfer using SSH consumes a relatively large amount of CPU time at both ends of the transfer (for encryption and decryption). The ARCHER2 login nodes have fairly fast processors that can sustain about 100 MB/s transfer. The encryption algorithm used is negotiated between the SSH client and the SSH server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The aes128-ctr or aes256-ctr algorithms are well supported and fast as they are implemented in hardware. These are not usually the default choice when using scp so you will need to manually specify them.

A single SSH based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.

In addition, you should consider the following when transferring data:

"},{"location":"user-guide/data/#scp","title":"scp","text":"

The scp command creates a copy of a file, or if given the -r flag, a directory either from a local machine onto a remote machine or from a remote machine onto a local machine.

For example, to transfer files to ARCHER2 from a local machine:

scp [options] source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

In the above example, the [destination] is optional, as when left out scp will copy the source into your home directory. Also, the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

If you want to request a different encryption algorithm add the -c [algorithm-name] flag to the scp options. For example, to use the (usually faster) aes128-ctr encryption algorithm you would use:

scp [options] -c aes128-ctr source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

"},{"location":"user-guide/data/#rsync","title":"rsync","text":"

The rsync command can also transfer data between hosts using a ssh connection. It creates a copy of a file or, if given the -r flag, a directory at the given destination, similar to scp above.

Given the -a option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the rsync command is executed with ssh to create the copy on a remote machine.

To transfer files to ARCHER2 using rsync with ssh the command has the form:

rsync [options] -e ssh source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

In the above example, the [destination] is optional, as when left out rsync will copy the source into your home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Additional flags can be specified for the underlying ssh command by using a quoted string as the argument of the -e flag. e.g.

rsync [options] -e \"ssh -c aes128-ctr\" source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

Tip

Further information on using rsync can be found in the rsync manual (accessed via man rsync or at man rsync).

"},{"location":"user-guide/data/#data-transfer-via-globus","title":"Data transfer via Globus","text":"

The ARCHER2 filesystems have a Globus Collection (formerly known as an endpoint) with the name \"Archer2 file systems\" Full step-by-step guide for using Globus to transfer files to/from ARCHER2

"},{"location":"user-guide/data/#data-transfer-via-gridftp","title":"Data transfer via GridFTP","text":"

ARCHER2 provides a module for grid computing, gct/6.2, otherwise known as the Globus Grid Community Toolkit v6.2.20201212. This toolkit provides a command line interface for moving data to and from GridFTP servers.

Data transfers are managed by the globus-url-copy command. Full details concerning this command's use can be found in the GCT 6.2 GridFTP User's Guide.

Info

Further information on using GridFTP on ARCHER2 to transfer data to the JASMIN facility can be found in the JASMIN user documentation.

"},{"location":"user-guide/data/#data-transfer-using-rclone","title":"Data transfer using rclone","text":"

Rclone is a command-line program to manage files on cloud storage. You can transfer files directly to/from cloud storage services, such as MS OneDrive and Dropbox. The program preserves timestamps and verifies checksums at all times.

First of all, you must download and unzip rclone on ARCHER2:

wget https://downloads.rclone.org/v1.62.2/rclone-v1.62.2-linux-amd64.zip\nunzip rclone-v1.62.2-linux-amd64.zip\ncd rclone-v1.62.2-linux-amd64/\n

The previous code snippet uses rclone v1.62.2, which was the latest version when these instructions were written.

Configure rclone using ./rclone config. This will guide you through an interactive setup process where you can make a new remote (called remote). See the following for detailed instructions for:

Please note that a token is required to connect from ARCHER2 to the cloud service. You need a web browser to get the token. The recommendation is to run rclone in your laptop using rclone authorize, get the token, and then copy the token from your laptop to ARCHER2. The rclone website contains further instructions on configuring rclone on a remote machine without web browser.

Once all the above is done, you're ready to go. If you want to copy a directory, please use:

rclone copy <archer2_directory> remote:<cloud_directory>

Please note that \"remote\" is the name that you have chosen when running rclone config. To copy files, please use:

rclone copyto <archer2_file> remote:<cloud_file>

Note

If the session times out while the data transfer takes place, adding the -vv flag to an rclone transfer forces rclone to output to the terminal and therefore avoids triggering the timeout process.

"},{"location":"user-guide/data/#ssh-data-transfer-example-laptopworkstation-to-archer2","title":"SSH data transfer example: laptop/workstation to ARCHER2","text":"

Here we have a short example demonstrating transfer of data directly from a laptop/workstation to ARCHER2.

Note

This guide assumes you are using a command line interface to transfer data. This means the terminal on Linux or macOS, MobaXterm local terminal on Windows or Powershell.

Before we can transfer of data to ARCHER2 we need to make sure we have an SSH key setup to access ARCHER2 from the system we are transferring data from. If you are using the same system that you use to log into ARCHER2 then you should be all set. If you want to use a different system you will need to generate a new SSH key there (or use SSH key forwarding) to allow you to connect to ARCHER2.

Tip

Remember that you will need to use both a key and your password to transfer data to ARCHER2.

Once we know our keys are setup correctly, we are now ready to transfer data directly between the two machines. We begin by combining our important research data in to a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt file3.txt\n

We then initiate the data transfer from our system to ARCHER2, here using rsync to allow the transfer to be recommenced without needing to start again, in the event of a loss of connection or other failure. For example, using the SSH key in the file ~/.ssh/id_RSA_A2 on our local system:

rsync -Pv -e\"ssh -c aes128-ctr -i $HOME/.ssh/id_RSA_A2\" ./all_my_files.tar.gz otbz19@login.archer2.ac.uk:/work/z19/z19/otbz19/\n

Note the use of the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The -e flag allows specification of the ssh command - we have used this to add the location of the identity file. The -c option specifies the cipher to be used as aes128-ctr which has been found to increase performance Unfortunately the ~ shortcut is not correctly expanded, so we have specified the full path. We move our research archive to our project work directory on ARCHER2.

Note

Remember to replace otbz19 with your username on ARCHER2.

If we were unconcerned about being able to restart an interrupted transfer, we could instead use the scp command,

scp -c aes128-ctr -i ~/.ssh/id_RSA_A2 all_my_files.tar.gz otbz19@login.archer2.ac.uk:/work/z19/z19/otbz19/\n

but rsync is recommended for larger transfers.

"},{"location":"user-guide/debug/","title":"Debugging","text":"

The following debugging tools are available on ARCHER2:

"},{"location":"user-guide/debug/#linaro-forge","title":"Linaro Forge","text":"

The Linaro Forge tool provides the DDT parallel debugger. See:

"},{"location":"user-guide/debug/#gdb4hpc","title":"gdb4hpc","text":"

The GNU Debugger for HPC (gdb4hpc) is a GDB-based debugger used to debug applications compiled with CCE, PGI, GNU, and Intel Fortran, C and C++ compilers. It allows programmers to either launch an application within it or to attach to an already-running application. Attaching to an already-running and hanging application is a quick way of understanding why the application is hanging, whereas launching an application through gdb4hpc will allow you to see your application running step-by-step, output the values of variables, and check whether the application runs as expected.

Tip

For your executable to be compatible with gdb4hpc, it will need to be coded with MPI. You will also need to compile your code with the debugging flag -g (e.g. cc -g my_program.c -o my_exe).

"},{"location":"user-guide/debug/#launching-through-gdb4hpc","title":"Launching through gdb4hpc","text":"

Launch gdb4hpc:

module load gdb4hpc\ngdb4hpc\n

You will get some information about this version of the program and, eventually, you will get a command prompt:

gdb4hpc 4.5 - Cray Line Mode Parallel Debugger\nWith Cray Comparative Debugging Technology.\nCopyright 2007-2019 Cray Inc. All Rights Reserved.\nCopyright 1996-2016 University of Queensland. All Rights Reserved.\nType \"help\" for a list of commands.\nType \"help <cmd>\" for detailed help about a command.\ndbg all>\n

We will use launch to begin a multi-process application within gdb4hpc. Consider that we are wanting to test an application called my_exe, and that we want this to be launched across all 256 processes in two nodes. We would launch this in gdb4hpc by running:

dbg all> launch --launcher-args=\"--account=[budget code] --partition=standard --qos=standard --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --exclusive --export=ALL\" $my_prog{256} ./my_ex\n

Make sure to replace the --account input to your budget code (e.g. if you are using budget t01, that part should look like --account=t01).

The default launcher is srun and the --launcher-args=\"...\" allows you to set launcher flags for srun. The variable $my_prog is a dummy name for the program being launched and you could use whatever name you want for it -- this will be the name of the srun job that will be run. The number in the brackets {256} is the number of processes over which the program will be executed, it's 256 here, but you could use any number. You should try to run this on as few processors as possible -- the more you use, the longer it will take for gdb4hpc to load the program.

Once the program is launched, gdb4hpc will load up the program and begin to run it. You will get output to screen something that looks like:

Starting application, please wait...\nCreating MRNet communication network...\nWaiting for debug servers to attach to MRNet communications network...\nTimeout in 400 seconds. Please wait for the attach to complete.\nNumber of dbgsrvs connected: [0];  Timeout Counter: [1]\nNumber of dbgsrvs connected: [0];  Timeout Counter: [2]\nNumber of dbgsrvs connected: [0];  Timeout Counter: [3]\nNumber of dbgsrvs connected: [1];  Timeout Counter: [0]\nNumber of dbgsrvs connected: [1];  Timeout Counter: [1]\nNumber of dbgsrvs connected: [2];  Timeout Counter: [0]\nFinalizing setup...\nLaunch complete.\nmy_prog{0..255}: Initial breakpoint, main at /PATH/TO/my_program.c:34\n

The line number at which the initial breakpoint is made (in the above example, line 34) corresponds to the line number at which MPI is initialised. You will not be able to see any parts of the code outside of the MPI region of a code with gdb4hpc.

Once the code is loaded, you can use various commands to move through your code. The following lists and describes some of the most useful ones:

Remember to exit the interactive session once you are done debugging.

"},{"location":"user-guide/debug/#attaching-with-gdb4hpc","title":"Attaching with gdb4hpc","text":"

Attaching to a hanging job using gdb4hpc is a great way of seeing which state each processor is in. However, this does not produce the most visually appealing results. For a more easy-to-read program, please take a look at the STAT tool.

In your interactive session, launch your executable as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n 256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00 --export=ALL \\\n            --account=[budget code] --partition=standard --qos=standard ./my_exe &\n

Make sure to replace the --account input to your budget code (e.g. if you are using budget t01, that part should look like --account=t01).

You will need to get the full job ID of the job you have just launched. To do this, run:

squeue -u $USER\n

and find the job ID associated with this interactive session -- this will be the one with the jobname bash. In this example:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n1050     workq my_mpi_j   jsindt  R       0:16      1 nid000001\n1051     workq     bash   jsindt  R       0:12      1 nid000002\n

the appropriate job id is 1051. Next, you will need to run sstat on this job id:

sstat 1051\n

This will output a large amount of information about this specific job. We are looking for the first number of this output, which should look like JOB_ID.## -- the number after the job ID is the number of slurm tasks performed in this interactive session. For our example (where srun is the first slurm task performed), the number is 1051.0.

Launch gdb4hpc:

module load gdb4hpc\ngdb4hpc\n

You will get some information about this version of the program and, eventually, you will get a command prompt:

gdb4hpc 4.5 - Cray Line Mode Parallel Debugger\nWith Cray Comparative Debugging Technology.\nCopyright 2007-2019 Cray Inc. All Rights Reserved.\nCopyright 1996-2016 University of Queensland. All Rights Reserved.\nType \"help\" for a list of commands.\nType \"help <cmd>\" for detailed help about a command.\ndbg all>\n

We will be using the attach command to attach to our program that hangs. This is done by writing:

dbg all> attach $my_prog JOB_ID.##\n

where JOB_ID.## is the full job ID found using sstat (in our example, this would be 1051.0). The name $my_prog is a dummy-name -- it could be whatever name you like.

As it is attaching, gdb4hpc will output text to screen that looks like:

Attaching to application, please wait...\nCreating MRNet communication network...\nWaiting for debug servers to attach to MRNet communications network...\nTimeout in 400 seconds. Please wait for the attach to complete.\nNumber of dbgsrvs connected: [0];  Timeout Counter: [1]\n\n...\n\nFinalizing setup...\nAttach complete.\nCurrent rank location:\n

After this, you will get an output that, among other things, tells you which line of your code each process is on, and what each process is doing. This can be helpful to see where the hang-up is.

If you accidentally attached to the wrong job, you can detach by running:

dbg all> release $my_prog\n

and re-attach with the correct job ID. You will need to change your dummy name from $my_prog to something else.

When you are finished using gbd4hpc, simply run:

dbg all> quit\n

Do not forget to exit your interactive session.

"},{"location":"user-guide/debug/#valgrind4hpc","title":"valgrind4hpc","text":"

valgrind4hpc is a Valgrind-based debugging tool to aid in the detection of memory leaks and errors in parallel applications. Valgrind4hpc aggregates any duplicate messages across ranks to help provide an understandable picture of program behavior. Valgrind4hpc manages starting and redirecting output from many copies of Valgrind, as well as recombining and filtering Valgrind messages. If your program can be debugged with Valgrind, it can be debugged with valgrind4hpc.

The valgrind4hpc module enables the use of standard valgrind as well as the valgrind4hpc version more suitable to parallel programs.

"},{"location":"user-guide/debug/#using-valgrind-with-serial-programs","title":"Using Valgrind with serial programs","text":"

Launch valgrind4hpc:

module load valgrind4hpc\n

Next, run your executable through valgrind:

valgrind --tool=memcheck --leak-check=yes my_executable\n

The log outputs to screen. The ERROR SUMMARY will tell you whether, and how many, memory errors there are in your program. Furthermore, if you compile your code using the -g debugging flag (e.g. gcc -g my_program.c -o my_executable.c), the log will point out the code lines where the error occurs.

Valgrind also includes a tool called Massif that can be used to give insight into the memory usage of your program. It takes regular snapshots and outputs this data into a single file, which can be visualised to show the total amount of memory used as a function of time. This shows when peaks and bottlenecks occur and allows you to identify which data structures in your code are responsible for the largest memory usage of your program.

Documentation explaining how to use Massif is available at the official Massif manual. In short, you should run your executable as follows:

valgrind --tool=massif my_executable\n

The memory profiling data will be output into a file called massif.out.pid, where pid is the runtime process ID of your program. A custom filename can be chosen using the --massif-out-file option, as follows:

valgrind --tool=massif --massif-out-file=optional_filename.out my_executable\n

The output file contains raw profiling statistics. To view a summary including a graphical plot of memory usage over time, use the ms_print command as follows:

ms_print massif.out.12345\n

or, to save to a file:

ms_print massif.out.12345 > massif.analysis.12345\n

This will show total memory usage over time as well as a breakdown of the top data structures contributing to memory usage at each snapshot where there has been a significant allocation or deallocation of memory.

"},{"location":"user-guide/debug/#using-valgrind4hpc-with-parallel-programs","title":"Using Valgrind4hpc with parallel programs","text":"

First, load valgrind4hpc:

module load valgrind4hpc\n

To run valgrind4hpc, first reserve the resources you will use with salloc. The following reservation request is for 2 nodes (256 physical cores) for 20 minutes on the short queue:

auser@uan01:> salloc --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 \\\n              --time=00:20:00 --partition=standard --qos=short \\\n              --hint=nomultithread \\\n              --distribution=block:block --account=[budget code]\n

Once your allocation is ready, Use valgrind4hpc to run and profile your executable. To test an executable called my_executable that requires two arguments arg1 and arg2 on 2 nodes and 256 processes, run:

valgrind4hpc --tool=memcheck --num-ranks=256 my_executable -- arg1 arg2\n

In particular, note the -- separating the executable from the arguments (this is not necessary if your executable takes no arguments).

Valgrind4hpc only supports certain tools found in valgrind. These are: memcheck, helgrind, exp-sgcheck, or drd. The --valgrind-args=\"arguments\" allows users to use valgrind options not supported in valgrind4hpc (e.g. --leak-check) -- note, however, that some of these options might interfere with valgrind4hpc.

More information on valgrind4hpc can be found in the manual (man valgrind4hpc).

"},{"location":"user-guide/debug/#stat","title":"STAT","text":"

The Stack Trace Analysis Tool (STAT) is a cross-platform debugging tool from the University of Wisconsin-Madison. ATP is based on the same technology as STAT, both are designed to gather and merge stack traces from a running application's parallel processes. The STAT tool can be useful when application seems to be deadlocked or stuck, i.e. they don't crash but they don't progress as expected, and it has been designed to scale to a very large number of processes. Full information on STAT, including use cases, is available at the STAT website.

STAT will attach to a running program and query that program to find out where all the processes in that program currently are. It will then process that data and produce a graph displaying the unique process locations (i.e. where all the processes in the running program currently are). To make this easily understandable it collates together all processes that are in the same place providing only unique program locations for display.

"},{"location":"user-guide/debug/#using-stat-on-archer2","title":"Using STAT on ARCHER2","text":"

On the login node, load the cray-stat module:

module load cray-stat\n

Then, launch your job using srun as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n 256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00  --export=ALL\\\n            --account=[budget code] --partition=standard --qos=standard./my_exe &\n

Note

This example has set the job time limit to 1 hour -- if you need longer, change the --time command.

You will need the Program ID (PID) of the job you have just launched -- the PID is printed to screen upon launch, or you can get it by running:

ps -u $USER\n

This will present you with a set of text that looks like this:

PID TTY          TIME CMD\n154296 ?     00:00:00 systemd\n154297 ?     00:00:00 (sd-pam)\n154302 ?     00:00:00 sshd\n154303 pts/8 00:00:00 bash\n157150 pts/8 00:00:00 salloc\n157152 pts/8 00:00:00 bash\n157183 pts/8 00:00:00 srun\n157185 pts/8 00:00:00 srun\n157191 pts/8 00:00:00 ps\n

Once your application has reached the point where it hangs, issue the following command (replacing PID with the ID of the first srun task -- in the above example, I would replace PID with 157183):

stat-cl -i PID\n

You will get an output that looks like this:

STAT started at 2020-07-22-13:31:35\nAttaching to job launcher (null):157565 and launching tool daemons...\nTool daemons launched and connected!\nAttaching to application...\nAttached!\nApplication already paused... ignoring request to pause\nSampling traces...\nTraces sampled!\nResuming the application...\nResumed!\nPausing the application...\nPaused!\n\n...\n\nDetaching from application...\nDetached!\n\nResults written to $PATH_TO_RUN_DIRECTORY/stat_results/my_exe.0000\n

Once STAT is finished, you can kill the srun job using scancel (replacing JID with the job ID of the job you just launched):

scancel JID\n

You can view the results that STAT has produced using the following command (note that \"my_exe\" will need to be replaced with the name of the executable you ran):

stat-view stat_results/my_exe.0000/00_my_exe.0000.3D.dot\n

This produces a graph displaying all the different places within the program that the parallel processes were when you queried them.

Note

To see the graph, you will need to have exported your X display when logging in.

Larger jobs may spend significant time queueing, requiring submission as a batch job. In this case, a slightly different invocation is illustrated as follows:

#!/bin/bash --login\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load additional modules\nmodule load cray-stat\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# This environment variable is required\nexport CTI_SLURM_OVERRIDE_MC=1\n\n# Request that stat sleeps for 3600 seconds before attaching\n# to our executable which we launch with command introduced\n# with -C:\n\nstat-cl -s 3600 -C srun --unbuffered ./my_exe\n

If the job is hanging it will continue to run until the wall clock exceeds the requested time. Use the stat-view utility to inspect the results, as discussed above.

"},{"location":"user-guide/debug/#atp","title":"ATP","text":"

To enable ATP you should load the atp module and set the ATP_ENABLED environment variable to 1 on the login node:

module load atp\nexport ATP_ENABLED=1\n# Fix for a known issue:\nexport HOME=${HOME/home/work}\n

Then, launch your job using srun as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n=256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00 --export=ALL \\\n            --account=[budget code] --partition=standard --qos=standard ./my_exe &\n

Note

This example has set the job time limit to 1 hour -- if you need longer, change the --time command.

Once the job has finished running, load the stat module to view the results:

module load cray-stat\n

and view the merged stack trace using:

stat-view atpMergedBT.dot\n

Note

To see the graph, you will need to have exported your X display when logging in.

"},{"location":"user-guide/dev-environment-4cab/","title":"Application development environment: 4-cabinet system","text":"

Important

This section covers the application development environment on the initial, 4-cabinet ARCHER2 system. For docmentation on the application development environment on the full ARCHER2 system, please see Application development environment: full system.

"},{"location":"user-guide/dev-environment-4cab/#whats-available","title":"What's available","text":"

ARCHER2 runs on the Cray Linux Environment (a version of SUSE Linux), and provides a development environment which includes:

Access to particular software, and particular versions, is managed by a standard TCL module framework. Most software is available via standard software modules and the different programming environments are available via module collections.

You can see what programming environments are available with:

auser@uan01:~> module savelist\nNamed collection list:\n 1) PrgEnv-aocc   2) PrgEnv-cray   3) PrgEnv-gnu\n

Other software modules can be listed with

auser@uan01:~> module avail\n------------------------------- /opt/cray/pe/perftools/20.09.0/modulefiles --------------------------------\nperftools       perftools-lite-events  perftools-lite-hbm    perftools-nwpc     \nperftools-lite  perftools-lite-gpu     perftools-lite-loops  perftools-preload  \n\n---------------------------------- /opt/cray/pe/craype/2.7.0/modulefiles ----------------------------------\ncraype-hugepages1G  craype-hugepages8M   craype-hugepages128M  craype-network-ofi          \ncraype-hugepages2G  craype-hugepages16M  craype-hugepages256M  craype-network-slingshot10  \ncraype-hugepages2M  craype-hugepages32M  craype-hugepages512M  craype-x86-rome             \ncraype-hugepages4M  craype-hugepages64M  craype-network-none   \n\n------------------------------------- /usr/local/Modules/modulefiles --------------------------------------\ndot  module-git  module-info  modules  null  use.own  \n\n-------------------------------------- /opt/cray/pe/cpe-prgenv/7.0.0 --------------------------------------\ncpe-aocc  cpe-cray  cpe-gnu  \n\n-------------------------------------------- /opt/modulefiles ---------------------------------------------\naocc/2.1.0.3(default)  cray-R/4.0.2.0(default)  gcc/8.1.0  gcc/9.3.0  gcc/10.1.0(default)  \n\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\natp/3.7.4(default)              cray-mpich-abi/8.0.15             craype-dl-plugin-py3/20.06.1(default)  \ncce/10.0.3(default)             cray-mpich-ucx/8.0.15             craype/2.7.0(default)                  \ncray-ccdb/4.7.1(default)        cray-mpich/8.0.15(default)        craypkg-gen/1.3.10(default)            \ncray-cti/2.7.3(default)         cray-netcdf-hdf5parallel/4.7.4.0  gdb4hpc/4.7.3(default)                 \ncray-dsmml/0.1.2(default)       cray-netcdf/4.7.4.0               iobuf/2.0.10(default)                  \ncray-fftw/3.3.8.7(default)      cray-openshmemx/11.1.1(default)   papi/6.0.0.2(default)                  \ncray-ga/5.7.0.3                 cray-parallel-netcdf/1.12.1.0     perftools-base/20.09.0(default)        \ncray-hdf5-parallel/1.12.0.0     cray-pmi-lib/6.0.6(default)       valgrind4hpc/2.7.2(default)            \ncray-hdf5/1.12.0.0              cray-pmi/6.0.6(default)           \ncray-libsci/20.08.1.2(default)  cray-python/3.8.5.0(default)      \n

A full discussion of the module system is available in the Software environment section.

A consistent set of modules is loaded on login to the machine (currently PrgEnv-cray, see below). Developing applications then means selecting and loading the appropriate set of modules before starting work.

This section is aimed at code developers and will concentrate on the compilation environment and building libraries and executables, and specifically parallel executables. Other topics such as Python and Containers are covered in more detail in separate sections of the documentation.

"},{"location":"user-guide/dev-environment-4cab/#managing-development","title":"Managing development","text":"

ARCHER2 supports common revision control software such as git.

Standard GNU autoconf tools are available, along with make (which is GNU Make). Versions of cmake are available.

Note

Some of these tools are part of the system software, and typically reside in /usr/bin, while others are provided as part of the module system. Some tools may be available in different versions via both /usr/bin and via the module system.

"},{"location":"user-guide/dev-environment-4cab/#compilation-environment","title":"Compilation environment","text":"

There are three different compiler environments available on ARCHER2: AMD (AOCC), Cray (CCE), and GNU (GCC). The current compiler suite is selected via the programming environment, while the specific compiler versions are determined by the relevant compiler module. A summary is:

Suite name Module Programming environment collection CCE cce PrgEnv-cray GCC gcc PrgEnv-gnu AOCC aocc PrgEnv-aocc

For example, at login, the default set of modules are:

Currently Loaded Modulefiles:\n1) cpe-cray                          7) cray-dsmml/0.1.2(default)                           \n2) cce/10.0.3(default)               8) perftools-base/20.09.0(default)                     \n3) craype/2.7.0(default)             9) xpmem/2.2.35-7.0.1.0_1.3__gd50fabf.shasta(default)  \n4) craype-x86-rome                  10) cray-mpich/8.0.15(default)                          \n5) libfabric/1.11.0.0.233(default)  11) cray-libsci/20.08.1.2(default)                      \n6) craype-network-ofi  \n

from which we see the default programming environment is Cray (indicated by cpe-cray (at 1 in the list above) and the default compiler module is cce/10.0.3 (at 2 in the list above). The programming environment will give access to a consistent set of compiler, MPI library via cray-mpich (at 10), and other libraries e.g., cray-libsci (at 11 in the list above) infrastructure.

Within a given programming environment, it is possible to swap to a different compiler version by swapping the relevant compiler module.

To ensure consistent behaviour, compilation of C, C++, and Fortran source code should then take place using the appropriate compiler wrapper: cc, CC, and ftn, respectively. The wrapper will automatically call the relevant underlying compiler and add the appropriate include directories and library locations to the invocation. This typically eliminates the need to specify this additional information explicitly in the configuration stage. To see the details of the exact compiler invocation use the -craype-verbose flag to the compiler wrapper.

The default link time behaviour is also related to the current programming environment. See the section below on Linking and libraries.

Users should not, in general, invoke specific compilers at compile/link stages. In particular, gcc, which may default to /usr/bin/gcc, should not be used. The compiler wrappers cc, CC, and ftn should be used via the appropriate module. Other common MPI compiler wrappers e.g., mpicc should also be replaced by the relevant wrapper cc (mpicc etc are not available).

Important

Always use the compiler wrappers cc, CC, and/or ftn and not a specific compiler invocation. This will ensure consistent compile/link time behaviour.

"},{"location":"user-guide/dev-environment-4cab/#compiler-man-pages-and-help","title":"Compiler man pages and help","text":"

Further information on both the compiler wrappers, and the individual compilers themselves are available via the command line, and via standard man pages. The man page for the compiler wrappers is common to all programming environments, while the man page for individual compilers depends on the currently loaded programming environment. The following table summarises options for obtaining information on the compiler and compile options:

Compiler suite C C++ Fortran Cray man craycc man crayCC man crayftn GNU man gcc man g++ man gfortran Wrappers man cc man CC man ftn

Tip

You can also pass the --help option to any of the compilers or wrappers to get a summary of how to use them. The Cray Fortran compiler uses ftn --craype-help to access the help options.

Tip

There are no man pages for the AOCC compilers at the moment.

Tip

Cray C/C++ is based on Clang and therefore supports similar options to clang/gcc (man clang is in fact equivalent to man craycc). clang --help will produce a full summary of options with Cray-specific options marked \"Cray\". The craycc man page concentrates on these Cray extensions to the clang front end and does not provide an exhaustive description of all clang options. Cray Fortran is not based on Flang and so takes different options from flang/gfortran.

"},{"location":"user-guide/dev-environment-4cab/#dynamic-linking","title":"Dynamic Linking","text":"

Executables on ARCHER2 link dynamically, and the Cray Programming Environment does not currently support static linking. This is in contrast to ARCHER where the default was to build statically.

If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

The compiler wrapper scripts on ARCHER link runtime libraries in using the runpath by default. This means that the paths to the runtime libraries are encoded into the executable so you do not need to load the compiler environment in your job submission scripts.

"},{"location":"user-guide/dev-environment-4cab/#which-compiler-environment","title":"Which compiler environment?","text":"

If you are unsure which compiler you should choose, we suggest the starting point should be the GNU compiler collection (GCC, PrgEnv-gnu); this is perhaps the most commonly used by code developers, particularly in the open source software domain. A portable, standard-conforming code should (in principle) compile in any of the three programming environments.

For users requiring specific compiler features, such as co-array Fortran, the recommended starting point would be Cray. The following sections provide further details of the different programming environments.

Warning

Intel compilers are not available on ARCHER2.

"},{"location":"user-guide/dev-environment-4cab/#amd-optimizing-cc-compiler-aocc","title":"AMD Optimizing C/C++ Compiler (AOCC)","text":"

The AMD Optimizing C/++ Compiler (AOCC) is a clang-based optimising compiler. AOCC (despite its name) includes a flang-based Fortran compiler.

Switch the the AOCC programming environment via

$ module restore PrgEnv-aocc\n

Note

Further details on AOCC will appear here as they become available.

"},{"location":"user-guide/dev-environment-4cab/#aocc-reference-material","title":"AOCC reference material","text":""},{"location":"user-guide/dev-environment-4cab/#cray-compiler-environment-cce","title":"Cray compiler environment (CCE)","text":"

The Cray compiler environment (CCE) is the default compiler at the point of login. CCE supports C/C++ (along with unified parallel C UPC), and Fortran (including co-array Fortran). Support for OpenMP parallelism is available for both C/C++ and Fortran (currently OpenMP 4.5, with a number of exceptions).

The Cray C/C++ compiler is based on a clang front end, and so compiler options are similar to those for gcc/clang. However, the Fortran compiler remains based around Cray-specific options. Be sure to separate C/C++ compiler options and Fortran compiler options (typically CFLAGS and FFLAGS) if compiling mixed C/Fortran applications.

Switch the the Cray programming environment via

$ module restore PrgEnv-cray\n
"},{"location":"user-guide/dev-environment-4cab/#useful-cce-cc-options","title":"Useful CCE C/C++ options","text":"

When using the compiler wrappers cc or CC, some of the following options may be useful:

Language, warning, Debugging options:

Option Comment -std=<standard> Default is -std=gnu11 (gnu++14 for C++) [1]

Performance options:

Option Comment -Ofast Optimisation levels: -O0, -O1, -O2, -O3, -Ofast -ffp=level Floating point maths optimisations levels 0-4 [2] -flto Link time optimisation

Miscellaneous options:

Option Comment -fopenmp Compile OpenMP (default is off) -v Display verbose output from compiler stages

Notes

  1. Option -std=gnu11 gives c11 plus GNU extensions (likewise c++14 plus GNU extensions). See https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/C-Extensions.html
  2. Option -ffp=3 is implied by -Ofast or -ffast-math
"},{"location":"user-guide/dev-environment-4cab/#useful-cce-fortran-options","title":"Useful CCE Fortran options","text":"

Language, Warning, Debugging options:

Option Comment -m <level> Message level (default -m 3 errors and warnings)

Performance options:

Option Comment -O <level> Optimisation levels: -O0 to -O3 (default -O2) -h fp<level> Floating point maths optimisations levels 0-3 -h ipa Inter-procedural analysis

Miscellaneous options:

Option Comment -h omp Compile OpenMP (default is -hnoomp) -v Display verbose output from compiler stages"},{"location":"user-guide/dev-environment-4cab/#gnu-compiler-collection-gcc","title":"GNU compiler collection (GCC)","text":"

The commonly used open source GNU compiler collection is available and provides C/C++ and Fortran compilers.

The GNU compiler collection is loaded by switching to the GNU programming environment:

$ module restore PrgEnv-gnu\n

Bug

The gcc/8.1.0 module is available on ARCHER2 but cannot be used as the supporting scientific and system libraries are not available. You should not use this version of GCC.

Warning

If you want to use GCC version 10 or greater to compile Fortran code, with the old MPI interfaces (i.e. use mpi or INCLUDE 'mpif.h') you must add the -fallow-argument-mismatch option (or equivalent) when compiling otherwise you will see compile errors associated with MPI functions. The reason for this is that past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines using the old MPI interfaces where arrays of different types are passed to, for example, MPI_Send(). This will now generate an error as not standard conforming. The -fallow-argument-mismatch option is used to reduce the error to a warning. The same effect may be achieved via -std=legacy.

If you use the Fortran 2008 MPI interface (i.e. use mpi_f08) then you should not need to add this option.

Fortran language MPI bindings are described in more detail at in the MPI Standard documentation.

"},{"location":"user-guide/dev-environment-4cab/#useful-gnu-fortran-options","title":"Useful Gnu Fortran options","text":"Option Comment -std=<standard> Default is gnu -fallow-argument-mismatch Allow mismatched procedure arguments. This argument is required for compiling MPI Fortran code with GCC version 10 or greater if you are using the older MPI interfaces (see warning above) -fbounds-check Use runtime checking of array indices -fopenmp Compile OpenMP (default is no OpenMP) -v Display verbose output from compiler stages

Tip

The standard in -std may be one of f95 f2003, f2008 or f2018. The default option -std=gnu is the latest Fortran standard plus gnu extensions.

Warning

Past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines where arrays of different types are passed to MPI_Send() and so on. This will now generate an error as not standard conforming. Use -fallow-argument-mismatch to reduce the error to a warning. The same effect may be achieved via -std=legacy.

"},{"location":"user-guide/dev-environment-4cab/#reference-material","title":"Reference material","text":""},{"location":"user-guide/dev-environment-4cab/#message-passing-interface-mpi","title":"Message passing interface (MPI)","text":""},{"location":"user-guide/dev-environment-4cab/#hpe-cray-mpich","title":"HPE Cray MPICH","text":"

HPE Cray provide, as standard, an MPICH implementation of the message passing interface which is specifically optimised for the ARCHER2 network. The current implementation supports MPI standard version 3.1.

The HPE Cray MPICH implementation is linked into software by default when compiling using the standard wrapper scripts: cc, CC and ftn.

"},{"location":"user-guide/dev-environment-4cab/#mpi-reference-material","title":"MPI reference material","text":"

MPI standard documents: https://www.mpi-forum.org/docs/

"},{"location":"user-guide/dev-environment-4cab/#linking-and-libraries","title":"Linking and libraries","text":"

Linking to libraries is performed dynamically on ARCHER2. One can use the -craype-verbose flag to the compiler wrapper to check exactly what linker arguments are invoked. The compiler wrapper scripts encode the paths to the programming environment system libraries using RUNPATH. This ensures that the executable can find the correct runtime libraries without the matching software modules loaded.

The library RUNPATH associated with an executable can be inspected via, e.g.,

$ readelf -d ./a.out\n

(swap a.out for the name of the executable you are querying).

"},{"location":"user-guide/dev-environment-4cab/#commonly-used-libraries","title":"Commonly used libraries","text":"

Modules with names prefixed by cray- are provided by HPE Cray, and are supported to be consistent with any of the programming environments and associated compilers. These modules should be the first choice for access to software libraries if available.

Tip

More information on the different software libraries on ARCHER2 can be found in the Software libraries section of the user guide.

"},{"location":"user-guide/dev-environment-4cab/#switching-to-a-different-hpe-cray-programming-environment-release","title":"Switching to a different HPE Cray Programming Environment release","text":"

Important

See the section below on using non-default versions of HPE Cray libraries below as this process will generally need to be followed when using software from non-default PE installs.

Access to non-default PE environments is controlled by the use of the cpe modules. These modules are typically loaded after you have restored a PrgEnv and loaded all the other modules you need and will set your compile environment to match that in the other PE release. This means:

For example, if you have a code that uses the Gnu programming environment, FFTW and NetCDF parallel libraries and you want to compile in the (non-default) 21.03 programming environment, you would do the following:

First, restore the Gnu programming environment and load the required library modules (FFTW and NetCDF HDF5 parallel). The loaded module list shows they are the versions from the default (20.10) programming environment):

auser@uan02:/work/t01/t01/auser> module restore -s PrgEnv-gnu\nauser@uan02:/work/t01/t01/auser> module load cray-fftw\nauser@uan02:/work/t01/t01/auser> module load cray-netcdf\nauser@uan02:/work/t01/t01/auser> module load cray-netcdf-hdf5parallel\nauser@uan02:/work/t01/t01/auser> module list\nCurrently Loaded Modulefiles:\n 1) cpe-gnu                           9) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)               \n 2) gcc/10.1.0(default)              10) cray-mpich/8.0.16(default)                                       \n 3) craype/2.7.2(default)            11) cray-libsci/20.10.1.2(default)                                   \n 4) craype-x86-rome                  12) bolt/0.7                                                         \n 5) libfabric/1.11.0.0.233(default)  13) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env  \n 6) craype-network-ofi               14) /usr/local/share/epcc-module/epcc-module-loader                  \n 7) cray-dsmml/0.1.2(default)        15) cray-fftw/3.3.8.8(default)                                       \n 8) perftools-base/20.10.0(default)  16) cray-netcdf-hdf5parallel/4.7.4.2(default) \n

Now, load the cpe/21.03 programming environment module to switch all the currently loaded HPE Cray modules from the default (20.10) programming environment version to the 21.03 programming environment versions:

auser@uan02:/work/t01/t01/auser> module load cpe/21.03\nSwitching to cray-dsmml/0.1.3.\nSwitching to cray-fftw/3.3.8.9.\nSwitching to cray-libsci/21.03.1.1.\nSwitching to cray-mpich/8.1.3.\nSwitching to cray-netcdf-hdf5parallel/4.7.4.3.\nSwitching to craype/2.7.5.\nSwitching to gcc/9.3.0.\nSwitching to perftools-base/21.02.0.\n\nLoading cpe/21.03\n  Unloading conflict: cray-dsmml/0.1.2 cray-fftw/3.3.8.8 cray-libsci/20.10.1.2 cray-mpich/8.0.16 cray-netcdf-hdf5parallel/4.7.4.2\n    craype/2.7.2 gcc/10.1.0 perftools-base/20.10.0\n  Loading requirement: cray-dsmml/0.1.3 cray-fftw/3.3.8.9 cray-libsci/21.03.1.1 cray-mpich/8.1.3 cray-netcdf-hdf5parallel/4.7.4.3\n    craype/2.7.5 gcc/9.3.0 perftools-base/21.02.0\nauser@uan02:/work/t01/t01/auser> module list\nCurrently Loaded Modulefiles:\n 1) cpe-gnu                                                           9) cray-dsmml/0.1.3                  17) cpe/21.03(default)  \n 2) craype-x86-rome                                                  10) cray-fftw/3.3.8.9                 \n 3) libfabric/1.11.0.0.233(default)                                  11) cray-libsci/21.03.1.1             \n 4) craype-network-ofi                                               12) cray-mpich/8.1.3                  \n 5) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)               13) cray-netcdf-hdf5parallel/4.7.4.3  \n 6) bolt/0.7                                                         14) craype/2.7.5                      \n 7) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env  15) gcc/9.3.0                         \n 8) /usr/local/share/epcc-module/epcc-module-loader                  16) perftools-base/21.02.0   \n

Finally (as noted above), you will need to modify the value of LD_LIBRARY_PATH before you compile your software to ensure it picks up the non-default versions of libraries:

auser@uan02:/work/t01/t01/auser> export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

Now you can go ahead and compile your software with the new programming environment.

Important

The cpe modules only change the versions of software modules provided as part of the HPE Cray programming environments. Any modules provided by the ARCHER2 service will need to be loaded manually after you have completed the process described above.

Note

Unloading the cpe module does not restore the original programming environment release. To restore the default programming environment release you should log out and then log back in to ARCHER2.

Bug

The cpe/21.03 module has a known issue with PrgEnv-gnu where it loads an old version of GCC (9.3.0) rather than the correct, newer version (10.2.0). You can resolve this by using the sequence:

module restore -s PrgEnv-gnu\n...load any other modules you need...\nmodule load cpe/21.03\nmodule unload cpe/21.03\nmodule swap gcc gcc/10.2.0\n

"},{"location":"user-guide/dev-environment-4cab/#available-hpe-cray-programming-environment-releases-on-archer2","title":"Available HPE Cray Programming Environment releases on ARCHER2","text":"

ARCHER2 currently has the following HPE Cray Programming Environment releases available:

Tip

You can see which programming environment release you currently have loaded by using module list and looking at the version number of the cray-libsci module you have loaded. The first two numbers indicate the version of the PE you have loaded. For example, if you have cray-libsci/20.10.1.2 loaded then you are using the 20.10 PE release.

"},{"location":"user-guide/dev-environment-4cab/#using-non-default-versions-of-hpe-cray-libraries-on-archer2","title":"Using non-default versions of HPE Cray libraries on ARCHER2","text":"

If you wish to make use of non-default versions of libraries provided by HPE Cray (usually because they are part of a non-default PE release: either old or new) then you need to make changes at both compile and runtime. In summary, you need to load the correct module and also make changes to the LD_LIBRARY_PATH environment variable.

At compile time you need to load the version of the library module before you compile and set the LD_LIBRARY_PATH environment variable to include the contencts of $CRAY_LD_LIBRARY_PATH as the first entry. For example, to use the, non-default, 20.08.1.2 version of HPE Cray LibSci in the default programming environment (Cray Compiler Environment, CCE) you would first setup the environment to compile with:

auser@uan01:~/test/libsci> module swap cray-libsci cray-libsci/20.08.1.2 \nauser@uan01:~/test/libsci> export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

The order is important here: every time you change a module, you will need to reset the value of LD_LIBRARY_PATH for the process to work (it will not be updated automatically).

Now you can compile your code. You can check that the executable is using the correct version of LibSci with the ldd command and look for the line beginning libsci_cray.so.5, you should see the version in the path to the library file:

auser@uan01:~/test/libsci> ldd dgemv.x \n    linux-vdso.so.1 (0x00007ffe4a7d2000)\n    libsci_cray.so.5 => /opt/cray/pe/libsci/20.08.1.2/CRAY/9.0/x86_64/lib/libsci_cray.so.5 (0x00007fafd6a43000)\n    libdl.so.2 => /lib64/libdl.so.2 (0x00007fafd683f000)\n    libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00007fafd663c000)\n    libquadmath.so.0 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libquadmath.so.0 (0x00007fafd63fc000)\n    libmodules.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libmodules.so.1 (0x00007fafd61e0000)\n    libfi.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libfi.so.1 (0x00007fafd5abe000)\n    libcraymath.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libcraymath.so.1 (0x00007fafd57e2000)\n    libf.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libf.so.1 (0x00007fafd554f000)\n    libu.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libu.so.1 (0x00007fafd523b000)\n    libcsup.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libcsup.so.1 (0x00007fafd5035000)\n    libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00007fafd4c62000)\n    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fafd4a43000)\n    libc.so.6 => /lib64/libc.so.6 (0x00007fafd4688000)\n    libm.so.6 => /lib64/libm.so.6 (0x00007fafd4350000)\n    /lib64/ld-linux-x86-64.so.2 (0x00007fafda988000)\n    librt.so.1 => /lib64/librt.so.1 (0x00007fafd4148000)\n    libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00007fafd3c92000)\n    libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00007fafd3a7a000)\n

Tip

If any of the libraries point to versions in the /opt/cray/pe/lib64 directory then these are using the default versions of the libraries rather than the specific versions. This happens at compile time if you have forgotton to load the right module and set $LD_LIBRARY_PATH afterwards.

At run time (typically in your job script) you need to repeat the environment setup steps (you can also use the ldd command in your job submission script to check the library is pointing to the correct version). For example, a job submission script to run our dgemv.x executable with the non-default version of LibSci could look like:

#!/bin/bash\n#SBATCH --job-name=dgemv\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n# Load the standard environment module\nmodule load epcc-job-env\n\n# Setup up the environment to use the non-default version of LibSci\n#   We use \"module swap\" as the \"cray-libsci\" is loaded by default.\n#   This must be done after loading the \"epcc-job-env\" module\nmodule swap cray-libsci cray-libsci/20.08.1.2\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n\n# Check which library versions the executable is pointing too\nldd dgemv.x\n\nexport OMP_NUM_THREADS=1\n\nsrun --hint=nomultithread --distribution=block:block dgemv.x\n

Tip

As when compiling, the order of commands matters. Setting the value of LD_LIBRARY_PATH must happen after you have finished all your module commands for it to have the correct effect.

Important

You must setup the environment at both compile and run time otherwise you will end up using the default version of the library.

"},{"location":"user-guide/dev-environment-4cab/#compiling-in-compute-nodes","title":"Compiling in compute nodes","text":"

Sometimes you may wish to compile in a batch job. For example, the compile process may take a long time or the compile process is part of the research workflow and can be coupled to the production job. Unlike login nodes, the /home file system is not available.

An example job submission script for a compile job using make (assuming the Makefile is in the same directory as the job submission script) would be:

#!/bin/bash\n\n#SBATCH --job-name=compile\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the compilation environment (cray, gnu or aocc)\nmodule restore /etc/cray-pe.d/PrgEnv-cray\n\nmake clean\n\nmake\n

Warning

Do not forget to include the full path when the compilation environment is restored. For instance:

module restore /etc/cray-pe.d/PrgEnv-cray

You can also use a compute node in an interactive way using salloc. Please see Section Using salloc to reserve resources for further details. Once your interactive session is ready, you can load the compilation environment and compile the code.

"},{"location":"user-guide/dev-environment-4cab/#build-instructions-for-software-on-archer2","title":"Build instructions for software on ARCHER2","text":"

The ARCHER2 CSE team at EPCC and other contributors provide build configurations ando instructions for a range of research software, software libraries and tools on a variety of HPC systems (including ARCHER2) in a public Github repository. See:

The repository always welcomes contributions from the ARCHER2 user community.

"},{"location":"user-guide/dev-environment-4cab/#support-for-building-software-on-archer2","title":"Support for building software on ARCHER2","text":"

If you run into issues building software on ARCHER2 or the software you require is not available then please contact the ARCHER2 Service Desk with any questions you have.

"},{"location":"user-guide/dev-environment/","title":"Application development environment","text":""},{"location":"user-guide/dev-environment/#whats-available","title":"What's available","text":"

ARCHER2 runs the HPE Cray Linux Environment (a version of SUSE Linux), and provides a development environment which includes:

Access to particular software, and particular versions, is managed by an Lmod module framework. Most software is available by loading modules, including the different compiler environments

You can see what compiler environments are available with:

auser@uan01:~> module avail PrgEnv\n\n--------------------------------------- /opt/cray/pe/lmod/modulefiles/core ----------------------------------------\n   PrgEnv-aocc/8.3.3    PrgEnv-cray/8.3.3 (L)    PrgEnv-gnu/8.3.3\n\n  Where:\n   L:  Module is loaded\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n

Other software modules can be searched using the module spider command:

auser@uan01:~> module spider\n\n---------------------------------------------------------------------------------------------------------------\nThe following is a list of the modules and extensions currently available:\n---------------------------------------------------------------------------------------------------------------\n  PrgEnv-aocc: PrgEnv-aocc/8.3.3\n\n  PrgEnv-cray: PrgEnv-cray/8.3.3\n\n  PrgEnv-gnu: PrgEnv-gnu/8.3.3\n\n  amd-uprof: amd-uprof/3.6.449\n\n  aocc: aocc/3.2.0\n\n  aocc-mixed: aocc-mixed/3.2.0\n\n  aocl: aocl/3.1, aocl/4.0\n\n  forge: forge/24.0\n\n  atp: atp/3.14.16\n\n  bolt: bolt/0.7, bolt/0.8\n\n  boost: boost/1.72.0, boost/1.81.0\n\n  castep: castep/22.11\n\n  cce: cce/15.0.0\n\n...output trimmed...\n

A full discussion of the module system is available in the Software environment section.

A consistent set of modules is loaded on login to the machine (currently PrgEnv-cray, see below). Developing applications then means selecting and loading the appropriate set of modules before starting work.

This section is aimed at code developers and will concentrate on the compilation environment, building libraries and executables, specifically parallel executables. Other topics such as Python and Containers are covered in more detail in separate sections of the documentation.

Tip

If you want to get back to the login module state without having to logout and back in again, you can just use:

module restore\n
This is also handy for build scripts to ensure you are starting from a known state.

"},{"location":"user-guide/dev-environment/#compiler-environments","title":"Compiler environments","text":"

There are three different compiler environments available on ARCHER2:

The current compiler suite is selected via the PrgEnv module , while the specific compiler versions are determined by the relevant compiler module. A summary is:

Suite name Compiler Environment Module Compiler Version Module CCE PrgEnv-cray cce GCC PrgEnv-gnu gcc AOCC PrgEnv-aocc aocc

For example, at login, the default set of modules are:

auser@ln03:~> module list\n\n  1) craype-x86-rome                         6) cce/15.0.0             11) PrgEnv-cray/8.3.3\n  2) libfabric/1.12.1.2.2.0.0                7) craype/2.7.19          12) bolt/0.8\n  3) craype-network-ofi                      8) cray-dsmml/0.2.2       13) epcc-setup-env\n  4) perftools-base/22.12.0                  9) cray-mpich/8.1.23      14) load-epcc-module\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-libsci/22.12.1.1\n

from which we see the default compiler environment is Cray (indicated by PrgEnv-cray (at 11 in the list above) and the default compiler module is cce/15.0.0 (at 6 in the list above). The compiler environment will give access to a consistent set of compiler, MPI library via cray-mpich (at 9), and other libraries e.g., cray-libsci (at 10 in the list above).

"},{"location":"user-guide/dev-environment/#switching-between-compiler-environments","title":"Switching between compiler environments","text":"

Switching between different compiler environments is achieved using the module load command. For example, to switch from the default HPE Cray (CCE) compiler environment to the GCC environment, you would use:

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n

If you then use the module list command, you will see that your environment has been changed to the GCC environment:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) bolt/0.8          11) cray-dsmml/0.2.2\n  2) libfabric/1.12.1.2.2.0.0                7) epcc-setup-env    12) cray-mpich/8.1.23\n  3) craype-network-ofi                      8) load-epcc-module  13) cray-libsci/22.12.1.1\n  4) perftools-base/22.12.0                  9) gcc/11.2.0        14) PrgEnv-gnu/8.3.3\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) craype/2.7.19\n
"},{"location":"user-guide/dev-environment/#switching-between-compiler-versions","title":"Switching between compiler versions","text":"

Within a given compiler environment, it is possible to swap to a different compiler version by swapping the relevant compiler module. To switch to the GNU compiler environment from the default HPE Cray compiler environment and than swap the version of GCC from the 11.2.0 default to the older 10.3.0 version, you would use

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n\nauser@ln03:~> module load gcc/10.3.0\n\nThe following have been reloaded with a version change:\n  1) gcc/11.2.0 => gcc/10.3.0\n

The first swap command moves to the GNU compiler environment and the second swap command moves to the older version of GCC. As before, module list will show that your environment has been changed:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) bolt/0.8          11) cray-libsci/22.12.1.1\n  2) libfabric/1.12.1.2.2.0.0                7) epcc-setup-env    12) PrgEnv-gnu/8.3.3\n  3) craype-network-ofi                      8) load-epcc-module  13) gcc/10.3.0\n  4) perftools-base/22.12.0                  9) craype/2.7.19     14) cray-mpich/8.1.23\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-dsmml/0.2.2\n
"},{"location":"user-guide/dev-environment/#compiler-wrapper-scripts-cc-cc-ftn","title":"Compiler wrapper scripts: cc, CC, ftn","text":"

To ensure consistent behaviour, compilation of C, C++, and Fortran source code should then take place using the appropriate compiler wrapper: cc, CC, and ftn, respectively. The wrapper will automatically call the relevant underlying compiler and add the appropriate include directories and library locations to the invocation. This typically eliminates the need to specify this additional information explicitly in the configuration stage. To see the details of the exact compiler invocation use the -craype-verbose flag to the compiler wrapper.

The default link time behaviour is also related to the current programming environment. See the section below on Linking and libraries.

Users should not, in general, invoke specific compilers at compile/link stages. In particular, gcc, which may default to /usr/bin/gcc, should not be used. The compiler wrappers cc, CC, and ftn should be used (with the underlying compiler type and version set by the module system). Other common MPI compiler wrappers e.g., mpicc, should also be replaced by the relevant wrapper, e.g. cc (commands such as mpicc are not available on ARCHER2).

Important

Always use the compiler wrappers cc, CC, and/or ftn and not a specific compiler invocation. This will ensure consistent compile/link time behaviour.

Tip

If you are using a build system such as Make or CMake then you will need to replace all occurrences of mpicc with cc, mpicxx/mpic++ with CC and mpif90 with ftn.

"},{"location":"user-guide/dev-environment/#compiler-man-pages-and-help","title":"Compiler man pages and help","text":"

Further information on both the compiler wrappers, and the individual compilers themselves are available via the command line, and via standard man pages. The man page for the compiler wrappers is common to all programming environments, while the man page for individual compilers depends on the currently loaded programming environment. The following table summarises options for obtaining information on the compiler and compile options:

Compiler suite C C++ Fortran Cray man clang man clang++ man crayftn GNU man gcc man g++ man gfortran Wrappers man cc man CC man ftn

Tip

You can also pass the --help option to any of the compilers or wrappers to get a summary of how to use them. The Cray Fortran compiler uses ftn --craype-help to access the help options.

Tip

There are no man pages for the AOCC compilers at the moment.

Tip

Cray C/C++ is based on Clang and therefore supports similar options to clang/gcc. clang --help will produce a full summary of options with Cray-specific options marked \"Cray\". The clang man page on ARCHER2 concentrates on these Cray extensions to the clang front end and does not provide an exhaustive description of all clang options. Cray Fortran is not based on Flang and so takes different options from flang/gfortran.

"},{"location":"user-guide/dev-environment/#which-compiler-environment","title":"Which compiler environment?","text":"

If you are unsure which compiler you should choose, we suggest the starting point should be the GNU compiler collection (GCC, PrgEnv-gnu); this is perhaps the most commonly used by code developers, particularly in the open source software domain. A portable, standard-conforming code should (in principle) compile in any of the three compiler environments.

For users requiring specific compiler features, such as coarray Fortran, the recommended starting point would be Cray. The following sections provide further details of the different compiler environments.

Warning

Intel compilers are not currently available on ARCHER2.

"},{"location":"user-guide/dev-environment/#gnu-compiler-collection-gcc","title":"GNU compiler collection (GCC)","text":"

The commonly used open source GNU compiler collection is available and provides C/C++ and Fortran compilers.

Switch the the GCC compiler environment from the default CCE (cray) compiler environment via:

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n

Warning

If you want to use GCC version 10 or greater to compile Fortran code, with the old MPI interfaces (i.e. use mpi or INCLUDE 'mpif.h') you must add the -fallow-argument-mismatch option (or equivalent) when compiling otherwise you will see compile errors associated with MPI functions. The reason for this is that past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines using the old MPI interfaces where arrays of different types are passed to, for example, MPI_Send(). This will now generate an error as not standard conforming. The -fallow-argument-mismatch option is used to reduce the error to a warning. The same effect may be achieved via -std=legacy.

If you use the Fortran 2008 MPI interface (i.e. use mpi_f08) then you should not need to add this option.

Fortran language MPI bindings are described in more detail at in the MPI Standard documentation.

"},{"location":"user-guide/dev-environment/#useful-gnu-fortran-options","title":"Useful Gnu Fortran options","text":"Option Comment -O<level> Optimisation levels: -O0, -O1, -O2, -O3, -Ofast. -Ofast is not recommended without careful regression testing on numerical output. -std=<standard> Default is gnu -fallow-argument-mismatch Allow mismatched procedure arguments. This argument is required for compiling MPI Fortran code with GCC version 10 or greater if you are using the older MPI interfaces (see warning above) -fbounds-check Use runtime checking of array indices -fopenmp Compile OpenMP (default is no OpenMP) -v Display verbose output from compiler stages

Tip

The standard in -std may be one of f95 f2003, f2008 or f2018. The default option -std=gnu is the latest Fortran standard plus gnu extensions.

Warning

Past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines where arrays of different types are passed to MPI_Send() and so on. This will now generate an error as not standard conforming. Use -fallow-argument-mismatch to reduce the error to a warning. The same effect may be achieved via -std=legacy.

"},{"location":"user-guide/dev-environment/#using-gcc-12x-on-archer2","title":"Using GCC 12.x on ARCHER2","text":"

GCC 12.x compilers are available on ARCHER2 for users who wish to access newer features (particularly C++ features).

Testing by the CSE service has identified that some software regression tests produce different results from the reference values when using software compiled with gfortran from GCC 12.x so we do not recommend its general use by users. Users should carefully check results from software built using compilers from GCC 12.x before using it for their research projects.

You can access GCC 12.x by using the commands:

module load extra-compilers\nmodule load PrgEnv-gnu\n
"},{"location":"user-guide/dev-environment/#reference-material","title":"Reference material","text":""},{"location":"user-guide/dev-environment/#cray-compiling-environment-cce","title":"Cray Compiling Environment (CCE)","text":"

The Cray Compiling Environment (CCE) is the default compiler at the point of login. CCE supports C/C++ (along with unified parallel C UPC), and Fortran (including co-array Fortran). Support for OpenMP parallelism is available for both C/C++ and Fortran (currently OpenMP 4.5, with a number of exceptions).

The Cray C/C++ compiler is based on a clang front end, and so compiler options are similar to those for gcc/clang. However, the Fortran compiler remains based around Cray-specific options. Be sure to separate C/C++ compiler options and Fortran compiler options (typically CFLAGS and FFLAGS) if compiling mixed C/Fortran applications.

As CCE is the default compiler environment on ARCHER2, you do not usually need to issue any commands to enable CCE.

Note

The CCE Clang compiler uses a GCC 8 toolchain so only C++ standard library features available in GCC 8 will be available in CCE Clang. You can add the compile option --gcc-toolchain=/opt/gcc/11.2.0/snos to use a more recent version of the C++ standard library if you wish.

"},{"location":"user-guide/dev-environment/#useful-cce-cc-options","title":"Useful CCE C/C++ options","text":"

When using the compiler wrappers cc or CC, some of the following options may be useful:

Language, warning, Debugging options:

Option Comment -std=<standard> Default is -std=gnu11 (gnu++14 for C++) [1] --gcc-toolchain=/opt/cray/pe/gcc/12.2.0/snos Use the GCC 12.2.0 toolchain instead of the default 11.2.0 version packaged with CCE

Performance options:

Option Comment -Ofast Optimisation levels: -O0, -O1, -O2, -O3, -Ofast. -Ofast is not recommended without careful regression testing on numerical output. -ffp=level Floating point maths optimisations levels 0-4 [2] -flto Link time optimisation

Miscellaneous options:

Option Comment -fopenmp Compile OpenMP (default is off) -v Display verbose output from compiler stages

Notes

  1. Option -std=gnu11 gives c11 plus GNU extensions (likewise c++14 plus GNU extensions). See https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/C-Extensions.html
  2. Option -ffp=3 is implied by -Ofast or -ffast-math
"},{"location":"user-guide/dev-environment/#useful-cce-fortran-options","title":"Useful CCE Fortran options","text":"

Language, Warning, Debugging options:

Option Comment -m <level> Message level (default -m 3 errors and warnings)

Performance options:

Option Comment -O <level> Optimisation levels: -O0 to -O3 (default -O2) -h fp<level> Floating point maths optimisations levels 0-3 -h ipa Inter-procedural analysis

Miscellaneous options:

Option Comment -h omp Compile OpenMP (default is -hnoomp) -v Display verbose output from compiler stages"},{"location":"user-guide/dev-environment/#cce-reference-documentation","title":"CCE Reference Documentation","text":""},{"location":"user-guide/dev-environment/#amd-optimizing-compiler-collection-aocc","title":"AMD Optimizing Compiler Collection (AOCC)","text":"

The AMD Optimizing Compiler Collection (AOCC) is a clang-based optimising compiler. AOCC also includes a flang-based Fortran compiler.

Load the AOCC compiler environment from the default CCE (cray) compiler environment via:

auser@ln03:~> module load PrgEnv-aocc\n\nLmod is automatically replacing \"cce/15.0.0\" with \"aocc/3.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-aocc/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n
"},{"location":"user-guide/dev-environment/#aocc-reference-material","title":"AOCC reference material","text":""},{"location":"user-guide/dev-environment/#message-passing-interface-mpi","title":"Message passing interface (MPI)","text":""},{"location":"user-guide/dev-environment/#hpe-cray-mpich","title":"HPE Cray MPICH","text":"

HPE Cray provide, as standard, an MPICH implementation of the message passing interface which is specifically optimised for the ARCHER2 interconnect. The current implementation supports MPI standard version 3.1.

The HPE Cray MPICH implementation is linked into software by default when compiling using the standard wrapper scripts: cc, CC and ftn.

You do not need to do anything to make HPE Cray MPICH available when you log into ARCHER2, it is available by default to all users.

"},{"location":"user-guide/dev-environment/#switching-to-alternative-ucx-mpi-implementation","title":"Switching to alternative UCX MPI implementation","text":"

HPE Cray MPICH can use two different low-level protocols to transfer data across the network. The default is the Open Fabrics Interface (OFI), but you can switch to the UCX protocol from Mellanox.

Which performs better will be application-dependent, but our experience is that UCX is often faster for programs that send a lot of data collectively between many processes, e.g. all-to-all communications patterns such as occur in parallel FFTs.

Note

You do not need to recompile your program - you simply load different modules in your Slurm script.

module load craype-network-ucx \nmodule load cray-mpich-ucx \n

Important

If your software was compiled using a compiler environment other then CCE you will also need to load that compiler environment as well as the UCX modules. For example, if you compiled using PrgEnv-gnu you would need to:

module load PrgEnv-gnu\nmodule load craype-network-ucx \nmodule load cray-mpich-ucx \n

The performance benefits will also vary depending on the number of processes, so it is important to benchmark your application at the scale used in full production runs.

"},{"location":"user-guide/dev-environment/#mpi-reference-material","title":"MPI reference material","text":"

MPI standard documents: https://www.mpi-forum.org/docs/

"},{"location":"user-guide/dev-environment/#linking-and-libraries","title":"Linking and libraries","text":"

Linking to libraries is performed dynamically on ARCHER2.

Important

Static linking is not supported on ARCHER2. If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

One can use the -craype-verbose flag to the compiler wrapper to check exactly what linker arguments are invoked. The compiler wrapper scripts encode the paths to the programming environment system libraries using RUNPATH. This ensures that the executable can find the correct runtime libraries without the matching software modules loaded.

The library RUNPATH associated with an executable can be inspected via, e.g.,

$ readelf -d ./a.out\n

(swap a.out for the name of the executable you are querying).

"},{"location":"user-guide/dev-environment/#commonly-used-libraries","title":"Commonly used libraries","text":"

Modules with names prefixed by cray- are provided by HPE Cray, and work with any of the compiler environments and. These modules should be the first choice for access to software libraries if available.

Tip

More information on the different software libraries on ARCHER2 can be found in the Software libraries section of the user guide.

"},{"location":"user-guide/dev-environment/#hpe-cray-programming-environment-cpe-releases","title":"HPE Cray Programming Environment (CPE) releases","text":""},{"location":"user-guide/dev-environment/#available-hpe-cray-programming-environment-cpe-releases","title":"Available HPE Cray Programming Environment (CPE) releases","text":"

ARCHER2 currently has the following HPE Cray Programming Environment (CPE) releases available:

You can find information, notes, and lists of changes for current and upcoming ARCHER2 HPE Cray programming environments in the HPE Cray Programming Environment GitHub repository.

Tip

We recommend that users use the most recent version of the PE available to get the latest improvements and bug fixes.

Later PE releases may sometimes be available via a containerised form. This allows developers to check that their code compiles and runs using CPE releases that have not yet been installed on ARCHER2.

CPE 23.12 is currently available as a Singularity container, see Using Containerised HPE Cray Programming Environments for further details.

"},{"location":"user-guide/dev-environment/#switching-to-a-different-hpe-cray-programming-environment-cpe-release","title":"Switching to a different HPE Cray Programming Environment (CPE) release","text":"

Important

See the section below on using non-default versions of HPE Cray libraries as this process will generally need to be followed when using software from non-default PE installs.

Access to non-default PE environments is controlled by the use of the cpe modules. Loading a cpe module will do the following:

For example, if you have a code that uses the Gnu compiler environment, FFTW and NetCDF parallel libraries and you want to compile in the (non-default) 22.04 programming environment, you would do the following:

First, load the cpe/23.09 module to switch all the defaults to the versions from the 22.04 PE. Then, swap to the GNU compiler environment and load the required library modules (FFTW, hdf5-parallel and NetCDF HDF5 parallel). The loaded module list shows they are the versions from the 22.04 PE:

module load cpe/23.09\n

Output:

The following have been reloaded with a version change:\n  1) PrgEnv-cray/8.3.3 => PrgEnv-cray/8.4.0             4) cray-mpich/8.1.23 => cray-mpich/8.1.27\n  2) cce/15.0.0 => cce/16.0.1                           5) craype/2.7.19 => craype/2.7.23\n  3) cray-libsci/22.12.1.1 => cray-libsci/23.09.1.1     6) perftools-base/22.12.0 => perftools-base/23.09.0\n

module load PrgEnv-gnu\n
Output:
Lmod is automatically replacing \"cce/16.0.1\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.4.0\" with \"PrgEnv-gnu/8.4.0\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.27\n

module load cray-fftw\nmodule load cray-hdf5-parallel\nmodule load cray-netcdf-hdf5parallel\nmodule list\n

Output:

Currently Loaded Modules:\n  1) craype-x86-rome                         6) epcc-setup-env          11) craype/2.7.23          16) cray-fftw/3.3.10.5\n  2) libfabric/1.12.1.2.2.0.0                7) load-epcc-module        12) cray-dsmml/0.2.2       17) cray-hdf5-parallel/1.12.2.7\n  3) craype-network-ofi                      8) perftools-base/23.09.0  13) cray-mpich/8.1.27      18) cray-netcdf-hdf5parallel/4.9.0.7\n  4) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta   9) cpe/23.09               14) cray-libsci/23.09.1.1\n  5) bolt/0.8                               10) gcc/11.2.0              15) PrgEnv-gnu/8.4.0\n

Now you can go ahead and compile your software with the new programming environment.

Important

The cpe modules only change the versions of software modules provided as part of the HPE Cray programming environments. Any modules provided by the ARCHER2 service will need to be loaded manually after you have completed the process described above.

Note

Unloading the cpe module does not restore the original programming environment release. To restore the default programming environment release you should log out and then log back in to ARCHER2.

"},{"location":"user-guide/dev-environment/#using-non-default-versions-of-hpe-cray-libraries","title":"Using non-default versions of HPE Cray libraries","text":"

If you wish to make use of non-default versions of libraries provided by HPE Cray (usually because they are part of a non-default PE release: either old or new) then you need to make changes at both compile and runtime. In summary, you need to load the correct module and also make changes to the LD_LIBRARY_PATH environment variable.

At compile time you need to load the version of the library module before you compile and set the LD_LIBRARY_PATH environment variable to include the contencts of $CRAY_LD_LIBRARY_PATH as the first entry. For example, to use the, non-default, 23.09.1.1 version of HPE Cray LibSci in the default programming environment (Cray Compiler Environment, CCE) you would first setup the environment to compile with:

module load cray-libsci/23.09.1.1\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

The order is important here: every time you change a module, you will need to reset the value of LD_LIBRARY_PATH for the process to work (it will not be updated automatically).

Now you can compile your code. You can check that the executable is using the correct version of LibSci with the ldd command and look for the line beginning libsci_cray.so.5, you should see the version in the path to the library file:

ldd dgemv.x \n

Output:

    linux-vdso.so.1 (0x00007ffc7fff5000)\n    libm.so.6 => /lib64/libm.so.6 (0x00007fd6a6361000)\n    libsci_cray.so.5 => /opt/cray/pe/libsci/23.09.1.1/CRAY/12.0/x86_64/lib/libsci_cray.so.5 (0x00007fd6a2419000)\n    libdl.so.2 => /lib64/libdl.so.2 (0x00007fd6a2215000)\n    libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00007fd6a68b3000)\n    libquadmath.so.0 => /opt/cray/pe/gcc-libs/libquadmath.so.0 (0x00007fd6a1fce000)\n    libmodules.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libmodules.so.1 (0x00007fd6a689a000)\n    libfi.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libfi.so.1 (0x00007fd6a1a29000)\n    libcraymath.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libcraymath.so.1 (0x00007fd6a67b3000)\n    libf.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libf.so.1 (0x00007fd6a6720000)\n    libu.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libu.so.1 (0x00007fd6a1920000)\n    libcsup.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libcsup.so.1 (0x00007fd6a6715000)\n    libc.so.6 => /lib64/libc.so.6 (0x00007fd6a152b000)\n    /lib64/ld-linux-x86-64.so.2 (0x00007fd6a66ac000)\n    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd6a1308000)\n    librt.so.1 => /lib64/librt.so.1 (0x00007fd6a10ff000)\n    libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00007fd6a0c53000)\n    libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00007fd6a0841000)\n    libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00007fd6a0628000)\n

Tip

If any of the libraries point to versions in the /opt/cray/pe/lib64 directory then these are using the default versions of the libraries rather than the specific versions. This happens at compile time if you have forgotton to load the right module and set $LD_LIBRARY_PATH afterwards.

At run time (typically in your job script) you need to repeat the environment setup steps (you can also use the ldd command in your job submission script to check the library is pointing to the correct version). For example, a job submission script to run our dgemv.x executable with the non-default version of LibSci could look like:

#!/bin/bash\n#SBATCH --job-name=dgemv\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n# Setup up the environment to use the non-default version of LibSci\nmodule load cray-libsci/23.09.1.1\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n\n# Check which library versions the executable is pointing too\nldd dgemv.x\n\nexport OMP_NUM_THREADS=1\n\nsrun --hint=nomultithread --distribution=block:block dgemv.x\n

Tip

As when compiling, the order of commands matters. Setting the value of LD_LIBRARY_PATH must happen after you have finished all your module commands for it to have the correct effect.

Important

You must setup the environment at both compile and run time otherwise you will end up using the default version of the library.

"},{"location":"user-guide/dev-environment/#compiling-on-compute-nodes","title":"Compiling on compute nodes","text":"

Sometimes you may wish to compile in a batch job. For example, the compile process may take a long time or the compile process is part of the research workflow and can be coupled to the production job. Unlike login nodes, the /home file system is not available.

An example job submission script for a compile job using make (assuming the Makefile is in the same directory as the job submission script) would be:

#!/bin/bash\n\n#SBATCH --job-name=compile\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n\nmake clean\n\nmake\n

Note

If you want to use a compiler environment other than the default then you will need to add the module load command before the make command. e.g. to use the GCC compiler environemnt:

module load PrgEnv-gnu\n

You can also use a compute node in an interactive way using salloc. Please see Section Using salloc to reserve resources for further details. Once your interactive session is ready, you can load the compilation environment and compile the code.

"},{"location":"user-guide/dev-environment/#using-the-compiler-wrappers-for-serial-compilations","title":"Using the compiler wrappers for serial compilations","text":"

The compiler wrappers link with a number of HPE-provided libraries automatically. It is possible to compile codes in serial with the compiler wrappers to take advantage of the HPE libraries.

To set up your environment for serial compilation, you will need to run:

  module load craype-network-none\n  module remove cray-mpich\n

Once this is done, you can use the compiler wrappers (cc for C, CC for C++, and ftn for Fortran) to compile your code in serial.

"},{"location":"user-guide/dev-environment/#managing-development","title":"Managing development","text":"

ARCHER2 supports common revision control software such as git.

Standard GNU autoconf tools are available, along with make (which is GNU Make). Versions of cmake are available.

Tip

Some of these tools are part of the system software, and typically reside in /usr/bin, while others are provided as part of the module system. Some tools may be available in different versions via both /usr/bin and via the module system. If you find the default version is too old, then look in the module system for a more recent version.

"},{"location":"user-guide/dev-environment/#build-instructions-for-software-on-archer2","title":"Build instructions for software on ARCHER2","text":"

The ARCHER2 CSE team at EPCC and other contributors provide build configurations ando instructions for a range of research software, software libraries and tools on a variety of HPC systems (including ARCHER2) in a public Github repository. See:

The repository always welcomes contributions from the ARCHER2 user community.

"},{"location":"user-guide/dev-environment/#support-for-building-software-on-archer2","title":"Support for building software on ARCHER2","text":"

If you run into issues building software on ARCHER2 or the software you require is not available then please contact the ARCHER2 Service Desk with any questions you have.

"},{"location":"user-guide/energy/","title":"Energy use","text":"

This section covers how to monitor energy use for your jobs on ARCHER2 and how to control the CPU frequency which allows some control over how much energy is consumed by jobs.

Important

The default CPU frequency cap on ARCHER2 compute nodes for jobs launched using srun is currently set to 2.0 GHz. Information below describes how to control the CPU frequency cap using Slurm.

"},{"location":"user-guide/energy/#monitoring-energy-use","title":"Monitoring energy use","text":"

The Slurm accounting database stores the total energy consumed by a job and you can also directly access the counters on compute nodes which capture instantaneous power and energy data broken down by different hardware components.

"},{"location":"user-guide/energy/#using-sacct-to-get-energy-usage-for-individual-jobs","title":"Using sacct to get energy usage for individual jobs","text":"

Energy usage for a particular job may be obtained using the sacct command. For instance

sacct -j 2658300 --format=JobID,Elapsed,ReqCPUFreq,ConsumedEnergy\n

will provide the elapsed time and consumed energy in joules for the job(s) specified with -j. The output of this command is:

JobID           Elapsed ReqCPUFreq ConsumedEnergy \n------------ ---------- ---------- -------------- \n2658300        02:19:48    Unknown          4.58M \n2658300.bat+   02:19:48          0          4.58M \n2658300.ext+   02:19:48          0          4.58M \n2658300.0      02:19:09    Unknown          4.57M \n

In this case we can see that the job consumed 4.58 MJ for a run lasting 2 hours, 19 minutes and 48 seconds with the CPU frequency unset. To convert the energy to kWh we can multiply the energy in joules by 2.78e-7, in this case resulting in 1.27 kWh.

The Slurm database may be cleaned without notice so you should gather any data you want as soon as possible after the job completes - you can even add the sacct command to the end of your job script to ensure this data is captured.

In addition to energy statistics sacct provides a number of other statistics that can be specified to the --format option, the full list of which can be viewed with

sacct --helpformat\n

or using the man pages.

"},{"location":"user-guide/energy/#accessing-the-node-energypower-counters","title":"Accessing the node energy/power counters","text":"

Note

The counters are available on each compute node and record data only for that compute node. If you are running multi-node jobs, you will need to combine data from multiple nodes to get data for the whole job.

On compute nodes, the raw energy counters and instantaneous power draw data are available at:

/sys/cray/pm_counters\n

There are a number of files in this directory, all the counter files include the current value and a timestamp.

This documentation is from the official HPE documentation:

Tip

The overall power and energy counters include all on-node systems. The major components are the CPU (processor), memory and Slingshot network interface controller (NIC).

Note

There exists an MPI-based wrapper library that can gather the pm counter values at runtime via a simple set of function calls. See the link below for details.

"},{"location":"user-guide/energy/#controlling-cpu-frequency","title":"Controlling CPU frequency","text":"

You can request specific CPU frequency caps (in kHz) for compute nodes through srun options or environment variables. The available frequency caps on the ARCHER2 processors along with the options and environment variables:

Frequency srun option Slurm environment variable Turbo boost enabled? 2.25 GHz --cpu-freq=2250000 export SLURM_CPU_FREQ_REQ=2250000 Yes 2.00 GHz --cpu-freq=2000000 export SLURM_CPU_FREQ_REQ=2000000 No 1.50 GHz --cpu-freq=1500000 export SLURM_CPU_FREQ_REQ=1500000 No

The only frequency caps available on the processors on ARCHER2 are 1.5 GHz, 2.0 GHz and 2.25GHz+turbo.

Important

Setting the CPU frequency cap in this way sets the maximum frequency that the processors can use. In practice, the individual cores may select different frequencies up to the value you have set depending on the workload on the processor.

Important

When you select the highest frequency value (2.25 GHz), you also enable turbo boost and so the processor is free to set the CPU frequency to values above 2.25 GHz if possible within the power and thermal limits of the processor. We see that, with turbo boost enabled, the processors typically boost to around 2.8 GHz even when performing compute-intensive work.

For example, you can add the following option to srun commands in your job submission scripts to set the CPU frequency to 2.25 GHz (and also enable turbo boost):

srun --cpu-freq=2250000 ...usual srun options and arguments...\n

Alternatively, you could add the following line to your job submission script before you use srun to launch the application:

export SLURM_CPU_FREQ_REQ=2250000\n

Tip

Testing by the ARCHER2 CSE team has shown that most software are most energy efficient when 2.0 GHz is selected as the CPU frequency.

Important

The CPU frequency settings only affect applications launched using the srun command.

Priority of frequency settings:

Tip

Adding the --cpu-freq=<freq in kHz> option to sbatch (e.g. using #SBATCH --cpu-freq=<freq in kHz> will not change the CPU frequency of srun commands used in the job as the default setting for ARCHER2 will override the sbatch option when the script runs.

"},{"location":"user-guide/energy/#default-cpu-frequency","title":"Default CPU frequency","text":"

If you do not specify a CPU frequency then you will get the default setting for the ARCHER2 service when you lanch an application using srun. The table below lists the history of default CPU frequency settings on the ARCHER2 service

Date range Default CPU frequency 12 Dec 2022 - current date 2.0 GHz Nov 2021 - 11 Dec 2022 Unspecified - defaults to 2.25 GHz"},{"location":"user-guide/energy/#slurm-cpu-frequency-settings-for-centrally-installed-software","title":"Slurm CPU frequency settings for centrally-installed software","text":"

Most centrally installed research software (available via module load commands) uses the same default Slurm CPU frequency as set globally for all ARCHER2 users (see above for this value). However, a small number of software have performance that is significantly degraded by using lower frequency settings and so the modules for these packages reset the CPU frequency to the highest value (2.25 GHz). The packages that currently do this are:

Important

If you specify the Slurm CPU frequency in your job scripts using one of the mechanisms described above after you have loaded the module, you will override the setting from the module.

"},{"location":"user-guide/functional-accounts/","title":"Functional accounts on ARCHER2","text":"

Functional accounts are used to enable persistent services, controlled by users running on ARCHER2. For example, running a licence server to allow jobs on compute nodes to check out a licence for restricted software.

There are a number of steps involved in setting up functional accounts:

  1. Submit a request to service desk for review and award of functional account entitlement
  2. Creation of the functional account and associating authorisation for your standard ARCHER2 account to access it
  3. Test that you can access the persistent service node (dvn04) and the functional account
  4. Setup of the persistent service on the persistent service node (dvn04)

We cover these steps in detail below with the concrete example of setting up a licence server using the FlexLM software but the process should be able to be generalised for other persistent services.

Note

If you have any questions about functional accounts and persistent services on ARCHER2 please contact the ARCHER2 Service Desk.

"},{"location":"user-guide/functional-accounts/#submit-a-request-to-service-desk","title":"Submit a request to service desk","text":"

If you wish to have access to a functional account for persistent services on ARCHER2 you should email the ARCHER2 Service Desk with a case for why you want to have this functionality. You should include the following information in your email:

"},{"location":"user-guide/functional-accounts/#creation-of-the-functional-account","title":"Creation of the functional account","text":"

If your request for a functional account is approved then the ARCHER2 user administration team will setup the account and enable access for the standard user accounts named in the application. They will then inform you of the functional account name.

"},{"location":"user-guide/functional-accounts/#test-access-to-functional-account","title":"Test access to functional account","text":"

The process for accessing the functional account is:

  1. Log into ARCHER2 using normal user account
  2. Setup an SSH key pair for login access to persistent service node (dvn04)
  3. Log into persistent service node (dvn04)
  4. Use sudo to access the functional account
"},{"location":"user-guide/functional-accounts/#login-to-archer2","title":"Login to ARCHER2","text":"

Log into ARCHER2 in the usual way using a normal user account that has been given access to manage the functional account.

"},{"location":"user-guide/functional-accounts/#setup-ssh-key-pair-for-dvn04-access","title":"Setup SSH key pair for dvn04 access","text":"

You can create a passphrase-less SSH key pair to use for access to the persistent service node using the ssh-keygen command. As long as you place the public and private key parts in the default location, you will not need any additional SSH options to access dvn04 from the ARCHER2 login nodes. Just hit enter when prompted for a passphrase to create a key with no passphrase.

Once the key pair has been created, you add the public part to the $HOME/.ssh/authorized_keys file on ARCHER2 to make it valid for login to dvn04 using the command cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys.

Example commands to setup SSH key pair:

auser@ln04:~> ssh-keygen -t rsa\n\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/home/t01/t01/auser/.ssh/id_rsa): \nEnter passphrase (empty for no passphrase): \nEnter same passphrase again: \nYour identification has been saved in /home/t01/t01/auser/.ssh/id_rsa\nYour public key has been saved in /home/t01/t01/auser/.ssh/id_rsa.pub\nThe key fingerprint is:\nSHA256:wX2bgNElbsPaT8HXKIflNmqnjSfg7a8BPM1R56b4/60 auser@ln02\nThe key's randomart image is:\n+---[RSA 3072]----+\n|        ..... o .|\n|       . *.o = = |\n|        + B B B +|\n|         * * % + |\n|        S * X o  |\n|         . O *   |\n|          . B +  |\n|           . + ..|\n|            ooE.=|\n+----[SHA256]-----+\n\nauser@ln04:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys\n
"},{"location":"user-guide/functional-accounts/#login-to-the-persistent-service-node-dvn04","title":"Login to the persistent service node (dvn04)","text":"

Once you are logged into an ARCHER2 login node, and assuming the SSH key is in the default location, you can now login to dvn04:

auser@ln04:~> ssh dvn04\n

Note

You will need to enter the TOTP for your ARCHER2 account to login to dvn04 unless you have logged in to the node recently.

"},{"location":"user-guide/functional-accounts/#access-the-functional-account","title":"Access the functional account","text":"

Once you are logged into dvn04, you use sudo to access the functional account.

Important

You must use the normal user account account password to use the sudo command. This password was set on your first ever login to ARCHER2 (and not used subsequently). If you have forgotten this password, you can reset it in SAFE.

For example, if the functional account is called testlm, you would access it (on dvn04) with:

auser@dvn04:~> sudo -iu testlm\n

To exit the functional account, you use the exit command which will return you to your normal user account on dvn04.

"},{"location":"user-guide/functional-accounts/#setup-the-persistent-service","title":"Setup the persistent service","text":"

You should use systemctl to manage your persistent service on dvn04. In order to use the systemctl command, you need to add the following lines to the ~/.bashrc for the functional account:

export XDG_RUNTIME_DIR=/run/user/$UID\nexport DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus\n

Next, create a service definition file for the persistent service and save it to a plain text file. Here is the example used for the QChem licence server:

[Unit]\nDescription=Licence manger for QChem\nAfter=network.target\nConditionHost=dvn04\n\n[Service]\nType=forking\nExecStart=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\nExecStop=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmutil lmdown -all -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\nSuccessExitStatus=15\nRestart=always\nRestartSec=30\n\n[Install]\nWantedBy=default.target\n

Enable the licence server service, e.g. for the QChem licence server service:

testlm@dvn04:~> systemctl --user enable /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service\n\nCreated symlink /home/y07/y07/testlm/.config/systemd/user/default.target.wants/qchem-lm.service \u2192 /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service.\nCreated symlink /home/y07/y07/testlm/.config/systemd/user/qchem-lm.service \u2192 /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service.\n

Once it has been enabled, you can start the licence server service, e.g. for the QChem licence server service:

testlm@dvn04:~> systemctl --user start qchem-lm.service\n

Check the status to make sure it is running:

testlm@dvn04:~> systemctl --user status qchem-lm\n\u25cf qchem-lm.service - Licence manger for QChem\n     Loaded: loaded (/home/y07/y07/testlm/.config/systemd/user/qchem-lm.service; enabled; vendor preset: disabled)\n     Active: active (running) since Thu 2024-05-16 15:33:59 BST; 8s ago\n    Process: 174248 ExecStart=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/ (code=exited, status=0/SUCCESS)\n   Main PID: 174249 (lmgrd)\n      Tasks: 8 (limit: 39321)\n     Memory: 5.6M\n        CPU: 18ms\n     CGroup: /user.slice/user-35153.slice/user@35153.service/app.slice/qchem-lm.service\n             \u251c\u2500 174249 /work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\n             \u2514\u2500 174253 qchemlm -T 10.252.1.77 11.19 10 -c :/work/y07/shared/apps/core/qchem/6.1/etc/flexnet/: -lmgrd_port 6979 -srv mdSVdgushTnAjHX1s1PTj0ppCjHJw1Uk9ylvs1j13zkaUzhDBFlbv4thnqEIAXV --lmgrd_start 66461957 -vdrestart 0 -l /work/y07/shar>\n
"},{"location":"user-guide/gpu/","title":"AMD GPU Development Platform","text":"

In early 2024 ARCHER2 users gained access to a small GPU system integrated into ARCHER2 which is designed to allow users to test and develop software using AMD GPUs.

Important

The GPU component is very small and so is aimed at software development and testing rather than to be used for production research.

"},{"location":"user-guide/gpu/#hardware-available","title":"Hardware available","text":"

The GPU Development Platform consists of 4 compute nodes each with:

"},{"location":"user-guide/gpu/#accessing-the-gpu-compute-nodes","title":"Accessing the GPU compute nodes","text":"

The GPU nodes can be accessed through the Slurm job submission system from the standard ARCHER2 login nodes. Details of the scheduler limits and configuration and example job submission scripts are provided below.

"},{"location":"user-guide/gpu/#compiling-software-for-the-gpu-compute-nodes","title":"Compiling software for the GPU compute nodes","text":""},{"location":"user-guide/gpu/#overview","title":"Overview","text":"

As a quick summary, the recommended procedure for compiling code that offloads to the AMD GPUs is as follows:

For details and alternative approaches, see below.

"},{"location":"user-guide/gpu/#programming-environments","title":"Programming Environments","text":"

The following programming environments and compilers are available to compile code for the AMD GPUs on ARCHER2 using the usual compiler wrappers (ftn, cc, CC), which is the recommended approach:

Programming Environment Description Actual compilers called by ftn, cc, CC PrgEnv-amd AMD LLVM compilers amdflang, amdclang, amdclang++ PrgEnv-cray Cray compilers crayftn, craycc, crayCC PrgEnv-gnu GNU compilers gfortran, gcc, g++ PrgEnv-gnu-amd hybrid gfortran, amdclang, amdclang++ PrgEnv-cray-amd hybrid crayftn, amdclang, amdclang++

To decide which compiler(s) to use to compile offload code for the AMD GPUs, you may find it useful to consult the Compilation Strategies for GPU Offloading section below.

The hybrid environments PrgEnv-gnu-amd and PrgEnv-cray-amd are provided as a convenient way to mitigate less mature OpenMP offload support in the AMD LLVM Fortran compiler. In these hybrid environments ftn therefore calls gfortran or crayftn instead of amdflang.

Details about the underlying compiler being called by a compiler wrapper can be checked using the --version flag, for example:

> module load PrgEnv-amd\n> cc --version\nAMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.3 22324 d6c88e5a78066d5d7a1e8db6c5e3e9884c6ad10e)\nTarget: x86_64-unknown-linux-gnu\nThread model: posix\nInstalledDir: /opt/rocm-5.2.3/llvm/bin\n
"},{"location":"user-guide/gpu/#rocm","title":"ROCm","text":"

Access to AMD's ROCm software stack is provided through the rocm module:

module load rocm\n

With the rocm module loaded the AMD LLVM compilers amdflang, amdclang, and amdclang++ become available to use directly or through AMD's compiler driver utility hipcc. Neither approach is recommended as a first choice for most users, as considerable care needs to be taken to pass suitable flags to the compiler or to hipcc. With PrgEnv-amd loaded the compiler wrappers ftn, cc, CC, which bypass hipcc and call amdflang, amdclang, or amdclang++ directly, take care of passing suitable compilation flags, which is why using these wrappers is the recommended approach for most users, at least initially.

Note: the rocm module should be loaded whenever you are compiling for the AMD GPUs, even if you are not using the AMD LLVM compilers (amdflang, amdclang, amdclang++).

The rocm module also provides access to other AMD tools, such as HIPIFY (hipify-clang or hipify-perl command), which enables translation of CUDA to HIP code. See also the section below on HIPIFY.

"},{"location":"user-guide/gpu/#gpu-target","title":"GPU target","text":"

Regardless of what approach you use, you will need to tell the underlying GPU compiler which GPU hardware to target. When using the compiler wrappers ftn, cc, or CC, as recommended, this can be done by ensuring the appropriate GPU target module is loaded:

module load craype-accel-amd-gfx90a\n
"},{"location":"user-guide/gpu/#cpu-target","title":"CPU target","text":"

The AMD GPU nodes are equipped with AMD EPYC Milan CPUs instead of the AMD EPYC Rome CPUs present on the regular CPU-only ARCHER2 compute nodes. Though the difference between these processors is small, when using the compiler wrappers ftn, cc, or CC, as recommended, we should load the appropriate CPU target module:

module load craype-x86-milan\n
"},{"location":"user-guide/gpu/#compilation-strategies-for-gpu-offloading","title":"Compilation Strategies for GPU Offloading","text":"

Compiler support on ARCHER2 for various programming models that enable offloading to AMD GPUs can be summarised at a glance in the following table:

PrgEnv Actual compiler OpenMP Offload HIP OpenACC PrgEnv-amd amdflang \u2705 \u274c \u274c PrgEnv-amd amdclang \u2705 \u274c \u274c PrgEnv-amd amdclang++ \u2705 \u2705 \u274c PrgEnv-cray crayftn \u2705 \u274c \u2705 PrgEnv-cray craycc \u2705 \u274c \u274c PrgEnv-cray crayCC \u2705 \u2705 \u274c PrgEnv-gnu gfortran \u274c \u274c \u274c PrgEnv-gnu gcc \u274c \u274c \u274c PrgEnv-gnu g++ \u274c \u274c \u274c

It is generally recommended to do the following:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

And then to use the ftn, cc and/or CC wrapper to compile as appropriate for the programming model in question. Specific guidance on how to do this for different programming models is provided in the subsections below.

When deviating from this procedure and using underlying compilers directly, or when debugging a problematic build using the wrappers, it may be useful to check what flags the compiler wrappers are passing to the underlying compiler. This can be done by using the -craype-verbose option with a wrapper when compiling a file. Optionally piping the resulting output to the command tr \" \" \"\\n\" so that flags are split over lines may be convenient for visual parsing. For example:

> CC -craype-verbose source.cpp | tr \" \" \"\\n\"\n
"},{"location":"user-guide/gpu/#openmp-offload","title":"OpenMP Offload","text":"

To use the compiler wrappers to compile code that offloads to GPU with OpenMP directives, first load the desired PrgEnv module and other necessary modules:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

Then use the appropriate compiler wrapper and pass the -fopenmp option to the wrapper when compiling. For example:

ftn -fopenmp source.f90\n

This should work under PrgEnv-amd and PrgEnv-cray, but not under PrgEnv-gnu as GCC 11.2.0 is the most recent version of GCC available on ARCHER2 and OpenMP offload to AMD MI200 series GPUs is only supported by GCC 13 and later.

You may find that offload directives introduced in more recent versions of the OpenMP standard, e.g. versions later than OpenMP 4.5, fail to compile with some compilers. Under PrgEnv-cray an explicit description of supported OpenMP features can be viewed using the command man intro_openmp.

"},{"location":"user-guide/gpu/#hip","title":"HIP","text":"

To compile C or C++ code that uses HIP written specifically to offload to AMD GPUs, first load the desired PrgEnv module (either PrgEnv-amd or PrgEnv-cray) and other necessary modules:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

Then compile using the CC compiler wrapper as follows:

CC -x hip -std=c++11 -D__HIP_ROCclr__ --rocm-path=${ROCM_PATH} source.cpp\n

Alternatively, you may use hipcc to drive the AMD LLVM compiler amdclang(++) to compile HIP code. In that case you will need to take care to explicitly pass all required offload flags to hipcc, such as:

-D__HIP_PLATFORM_AMD__ --offload-arch=gfx90a\n

To see what hipcc passes to the compiler, you can pass the --verbose option. If you are compiling MPI-parallel HIP code with hipcc, please see additional guidance under HIPCC and MPI.

hipcc can compile both HIP code for device (GPU) execution and non-HIP code for host (CPU) execution and will default to using the AMD LLVM compiler amdclang(++) to do so. If your software consists of separate compilation units - typically separate files - containing HIP code non-HIP code, it is possible to use a different compiler than hipcc to compile the non-HIP code. To do this:

"},{"location":"user-guide/gpu/#openacc","title":"OpenACC","text":"

Offloading using OpenACC directives on ARCHER2 is only supported by the Cray Fortran compiler. You should therefore load the following:

module load PrgEnv-cray\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

OpenACC Fortran code can then be compiled using the -hacc flag, as follows:

ftn -hacc source.f90\n

Details on what OpenACC standard and features are supported under PrgEnv-cray can be viewed using the command man intro_openacc.

"},{"location":"user-guide/gpu/#advanced-compilation","title":"Advanced Compilation","text":""},{"location":"user-guide/gpu/#openmp-offload-openmp-cpu-threading","title":"OpenMP Offload + OpenMP CPU threading","text":"

Code may use OpenMP for multithreaded execution on the host CPU in combination with target directives to offload work to GPU. Both uses of OpenMP can coexist in a single compilation unit, which should be compiled using the relevant compiler wrapper and the -fopenmp flag.

"},{"location":"user-guide/gpu/#hip-openmp-offload","title":"HIP + OpenMP Offload","text":"

Using both OpenMP and HIP to offload to GPU is possible, but only if the two programming models are not mixed in the same compilation unit. Two or more separate compilation units - typically separate source files - should be compiled as recommended individually for HIP and OpenMP offload code in the respective sections above. The resulting code objects (.o files) should then be linked together using a compiler wrapper with the -fopenmp flag, but without the -x hip flag.

"},{"location":"user-guide/gpu/#hip-openmp-cpu-threading","title":"HIP + OpenMP CPU threading","text":"

Code in a single compilation unit, such as a single source file, can use HIP to offload to GPU as well as OpenMP for multithreaded execution on the host CPU. Compilation should be done using the relevant compiler wrapper and the flags -fopenmp and \u2013x hip - in that order - as well as the flags for HIP compilation specified above:

CC -fopenmp -x hip -std=c++11 -D__HIP_ROCclr__ --rocm-path=${ROCM_PATH} source.cpp\n
"},{"location":"user-guide/gpu/#hipcc-and-mpi","title":"HIPCC and MPI","text":"

When compiling an MPI-parallel code with hipcc instead of a compiler wrapper, the path to the Cray MPI library include directory should be passed explicitly, or set as part of the CXXFLAGS environment variable, as:

-I${CRAY_MPICH_DIR}/include\n

MPI library directories should also be passed to hipcc, or set as part of the LDFLAGS environment variable prior to compiling, as:

-L${CRAY_MPICH_DIR}/lib ${PE_MPICH_GTL_DIR_amd_gfx90a}\n

Finally the MPI library should be linked explicitly, or set as part of the LIBS environment variable prior to linking, as:

-lmpi ${PE_MPICH_GTL_LIBS_amd_gfx90a}\n
"},{"location":"user-guide/gpu/#cmake","title":"Cmake","text":"

Documentation about integrating rocm with cmake can be found here.

"},{"location":"user-guide/gpu/#gpu-aware-mpi","title":"GPU-aware MPI","text":"

Need to set an environment variable to enable GPU support in cray-mpich:

export MPICH_GPU_SUPPORT_ENABLED=1

No additional or alternative MPI modules need to be loaded instead of the default cray-mpich module.

This supports GPU-GPU transfers:

Be aware that on these nodes there are only two PCIe network cards in each node and they may not be in the same memory region to a given GPU. Therefore NUMA effects are to be expected in multi-node communication. More detail on this is provided below.

"},{"location":"user-guide/gpu/#libraries","title":"Libraries","text":"

In order to access the GPU-accelerated version of Cray's LibSci maths libraries, a new module has been provided:

cray-libsci_acc

With this module loaded, documentation can be viewed using the command man intro_libsci_acc.

Additionally a number of libraries are provided as part of the rocm module.

"},{"location":"user-guide/gpu/#python-environment","title":"Python Environment","text":"

The cray-python module can be used as normal for the GPU partition with mpi4py package that is installed by default. mpi4py uses cray-mpich under the hood and in the same way as the CPU compute nodes.

However unless specifically compiled for GPU-GPU communication certain python packages/frameworks that try to take advantage of the fast links between GPUs by calling MPI on GPU pointers may have issues. To set the environment correctly for a given python program the following snippet can be added to load the required libmpi_gtl_hsa library:

from os import environ\nif environ.get(\"MPICH_GPU_SUPPORT_ENABLED\", False):\n    from ctypes import CDLL, RTLD_GLOBAL\n    CDLL(f\"{environ.get('CRAY_MPICH_ROOTDIR')}/gtl/lib/libmpi_gtl_hsa.so\", mode=RTLD_GLOBAL)\n\nfrom mpi4py import MPI\n
"},{"location":"user-guide/gpu/#supported-software","title":"Supported software","text":"

The ARCHER2 GPU development platform is intended for code development, testing and experimentation and will not have supported centrally installed versions of codes as is the case for the standard ARCHER2 CPU compute nodes. However some builds are being made available to users by members of CSE to under a best effort approach to support the community.

Codes that have modules targeting GPUs are:

Note

Will be filled out as applications are compiled and made available.

"},{"location":"user-guide/gpu/#running-jobs-on-the-gpu-nodes","title":"Running jobs on the GPU nodes","text":"

To run a GPU job, you must specify a GPU partition and a quality of service (QoS) as well as the number of GPUs required. You specify the number of GPU cards you want per node using the --gpus=N option, where N is typically 1, 2 or 4.

Note

As there are 4 GPUs per node, each GPU is associated with 1/4 of the resources of the node, i.e., 8 of 32 physical cores and roughly 128 GiB of the total 512 GiB host memory.

Allocations of host resources are made pro-rata. For example, if 2 GPUs are requested, sbatch will allocate 16 cores and around 256 GiB of host memory (in addition to 2 GPUs). Any attempt to use more than the allocated resources will result in an error.

This automatic allocation by Slurm for GPU jobs means that the submission script should not specify options such as --ntasks and --cpus-per-task. Such a job submission will be rejected. See below for some examples of how to use host resources and how to launch MPI applications.

Warning

In order to run jobs on the GPU nodes your ARCHER2 budget must have positive CU hours associated with it. However, your budget will not be charged for any GPU jobs you run.

"},{"location":"user-guide/gpu/#slurm-partitions","title":"Slurm Partitions","text":"

Your job script must specify a partition. The following table has a list of relevant GPU partition(s) on ARCHER2.

Partition Description Max nodes available gpu GPU nodes with AMD EPYC 32-core processor, 512 GB memory, 4\u00d7AMD Instinct MI210 GPU 4"},{"location":"user-guide/gpu/#slurm-quality-of-service-qos","title":"Slurm Quality of Service (QoS)","text":"

Your job script must specify a QoS relevant for the GPU nodes. Available QoS specifications are as follows.

QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes gpu-shd 1 12 hr 2 1 gpu Nodes potentially shared with other users gpu-exc 2 12 hr 2 1 gpu Exclusive node access"},{"location":"user-guide/gpu/#example-job-submission-scripts","title":"Example job submission scripts","text":"

Here are a series of example jobs for various patterns of running on the ARCHER2 GPU nodes They cover the following scenarios:

"},{"location":"user-guide/gpu/#single-gpu","title":"Single GPU","text":"

This example requests a single GPU on a potentially shared node and launch using a single CPU process with offload to a single GPU.

#!/bin/bash\n\n#SBATCH --job-name=single-GPU\n#SBATCH --gpus=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\nsrun --ntasks=1 --cpus-per-task=1 ./my_gpu_program.x\n
"},{"location":"user-guide/gpu/#multiple-gpu-on-a-single-node-shared-node-access-max-2-gpu","title":"Multiple GPU on a single node - shared node access (max. 2 GPU)","text":"

This example requests two GPUs on a potentially shared node and launch using two MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the two MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=2\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks=2 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\nsrun --ntasks=2 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n
"},{"location":"user-guide/gpu/#multiple-gpu-on-a-single-node-exclusive-node-access-max-4-gpu","title":"Multiple GPU on a single node - exclusive node access (max. 4 GPU)","text":"

This example requests four GPUs on a single node and launches the program using four MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=4\n#SBATCH --nodes=1\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\nsrun --ntasks=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n

Note

When you use the --qos=gpu-exc QoS you must also add the --exclusive flag and then specify the number of nodes you want with --nodes=1.

"},{"location":"user-guide/gpu/#multiple-gpu-on-multiple-nodes-exclusive-node-access-max-8-gpu","title":"Multiple GPU on multiple nodes - exclusive node access (max. 8 GPU)","text":"

This example requests eight GPUs across two nodes and launches the program using eight MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=4\n#SBATCH --nodes=2\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Check assigned GPU\nnodelist=$(scontrol show hostname $SLURM_JOB_NODELIST)\nfor nodeid in $nodelist\ndo\n   echo $nodeid\n   srun --ntasks=1 --nodelist=$nodeid rocm-smi\ndone\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks=8 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\nsrun --ntasks=8 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n

Note

When you use the --qos=gpu-exc QoS you must also add the --exclusive flag and then specify the number of nodes you want with --nodes=1.

"},{"location":"user-guide/gpu/#interactive-jobs","title":"Interactive jobs","text":""},{"location":"user-guide/gpu/#using-salloc","title":"Using salloc","text":"

Tip

This method does not give you an interactive shell on a GPU compute node. If you want an interactive shell on the GPU compute nodes, see the srun method described below.

If you wish to have a terminal to perform interactive testing, you can use the salloc command to reserve the resources so you can use srun commands interactively. For example, to request 1 GPU for 20 minutes you would use (remember to replace t01 with your budget code):

auser@ln04:/work/t01/t01/auser> salloc --gpus=1 --time=00:20:00 --partition=gpu --qos=gpu-shd --account=t01\nsalloc: Pending job allocation 5335731\nsalloc: job 5335731 queued and waiting for resources\nsalloc: job 5335731 has been allocated resources\nsalloc: Granted job allocation 5335731\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid200001 are ready for job\n\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> srun rocm-smi\n\n\n======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    31.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n\n\nsrun: error: nid200001: tasks 0: Exited with exit code 2\nsrun: launch/slurm: _step_signal: Terminating StepId=5335731.0\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> srun --ntasks=1 --cpus-per-task=8 --hint=nomultithread xthi\nNode summary for    1 nodes:\nNode    0, hostname nid200001, mpi   1, omp   1, executable xthi\nMPI summary: 1 ranks\nNode    0, rank    0, thread   0, (affinity =  0-7)\n
"},{"location":"user-guide/gpu/#using-srun","title":"Using srun","text":"

If you want an interactive terminal on a GPU node then you can use the srun command to achieve this. For example, to request 1 GPU for 20 minutes with an interactive terminal on a GPU compute node you would use (remember to replace t01 with your budget code):

auser@ln04:/work/t01/t01/auser> srun --gpus=1 --time=00:20:00 --partition=gpu --qos=gpu-shd --account=z19 --pty /bin/bash\nsrun: job 5335771 queued and waiting for resources\nsrun: job 5335771 has been allocated resources\nauser@nid200001:/work/t01/t01/auser>\n

Note that the command prompt has changed to indicate we are now on a GPU compute node. You can now directly run commands that interact with the GPU devices, e.g.:

auser@nid200001:/work/t01/t01/auser> rocm-smi\n\n======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    29.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

Warning

Launching parallel jobs on GPU nodes from an interactive shell on a GPU node is not straightforward so you should either use job submission scripts or the salloc method of interactive use described above.

"},{"location":"user-guide/gpu/#environment-variables","title":"Environment variables","text":""},{"location":"user-guide/gpu/#rocr_visible_devices","title":"ROCR_VISIBLE_DEVICES","text":"

A list of device indices or UUIDs that will be exposed to applications

Runtime : ROCm Platform Runtime. Applies to all applications using the user mode ROCm software stack.

export ROCR_VISIBLE_DEVICES=\"0,GPU-DEADBEEFDEADBEEF\"

"},{"location":"user-guide/gpu/#hip-environment-variables","title":"HIP Environment variables","text":"

https://rocm.docs.amd.com/projects/HIP/en/docs-5.2.3/how_to_guides/debugging.html#summary-of-environment-variables-in-hip

"},{"location":"user-guide/gpu/#amd_log_level","title":"AMD_LOG_LEVEL","text":"

Enable HIP log on different Level.

export AMD_LOG_LEVEL=1

"},{"location":"user-guide/gpu/#amd_log_mask","title":"AMD_LOG_MASK","text":"

Enable HIP log on different Levels.

export AMD_LOG_MASK=0x1

Default: 0x7FFFFFFF\n\n0x1: Log API calls.\n0x02: Kernel and Copy Commands and Barriers.\n0x4: Synchronization and waiting for commands to finish.\n0x8: Enable log on information and below levels.\n0x20: Queue commands and queue contents.\n0x40: Signal creation, allocation, pool.\n0x80: Locks and thread-safety code.\n0x100: Copy debug.\n0x200: Detailed copy debug.\n0x400: Resource allocation, performance-impacting events.\n0x800: Initialization and shutdown.\n0x1000: Misc debug, not yet classified.\n0x2000: Show raw bytes of AQL packet.\n0x4000: Show code creation debug.\n0x8000: More detailed command info, including barrier commands.\n0x10000: Log message location.\n0xFFFFFFFF: Log always even mask flag is zero.\n
"},{"location":"user-guide/gpu/#hip_visible_devices","title":"HIP_VISIBLE_DEVICES:","text":"

For system with multiple devices, it\u2019s possible to make only certain device(s) visible to HIP via setting environment variable, HIP_VISIBLE_DEVICES(or CUDA_VISIBLE_DEVICES on Nvidia platform), only devices whose index is present in the sequence are visible to HIP.

Runtime : HIP Runtime. Applies only to applications using HIP on the AMD platform.

export HIP_VISIBLE_DEVICES=0,1

"},{"location":"user-guide/gpu/#amd_serialize_kernel","title":"AMD_SERIALIZE_KERNEL","text":"

To serialize the kernel enqueuing set the following variable,

export AMD_SERIALIZE_KERNEL=1

"},{"location":"user-guide/gpu/#amd_serialize_copy","title":"AMD_SERIALIZE_COPY","text":"

To serialize the copies set,

export AMD_SERIALIZE_COPY=1

"},{"location":"user-guide/gpu/#hip_host_coherent","title":"HIP_HOST_COHERENT","text":"

Sets whether memory in coherent in hipHostMalloc.

export HIP_HOST_COHERENT=1

If the value is 1, memory is coherent with host; if 0, memory is not coherent between host and GPU.

"},{"location":"user-guide/gpu/#openmp-environment-variables","title":"OpenMP Environment variables","text":"

https://rocm.docs.amd.com/en/docs-5.2.3/reference/openmp/openmp.html#environment-variables

"},{"location":"user-guide/gpu/#omp_default_device","title":"OMP_DEFAULT_DEVICE","text":"

Default device used for OpenMP target offloading.

Runtime : OpenMP Runtime. Applies only to applications using OpenMP offloading.

export OMP_DEFAULT_DEVICE=\"2\"

sets the default device to the 3rd device on the node.

"},{"location":"user-guide/gpu/#omp_num_teams","title":"OMP_NUM_TEAMS","text":"

Users can choose the number of teams used for kernel launch by setting,

export OMP_NUM_THREADS

this can be tuned to optimise performance.

"},{"location":"user-guide/gpu/#gpu_max_hw_queues","title":"GPU_MAX_HW_QUEUES","text":"

To set the number of HSA queues used in the OpenMP runtime set,

export GPU_MAX_HW_QUEUES

"},{"location":"user-guide/gpu/#mpi-environment-variables","title":"MPI Environment variables","text":""},{"location":"user-guide/gpu/#mpich_gpu_support_enabled","title":"MPICH_GPU_SUPPORT_ENABLED","text":"

Activates GPU aware MPI in Cray MPICH:

export MPICH_GPU_SUPPORT_ENABLED=1

If not set MPI calls that attempt to send messages from buffers that are on GPU-attached memory will crash/hang.

"},{"location":"user-guide/gpu/#hsa_enable_sdma","title":"HSA_ENABLE_SDMA","text":"

export HSA_ENABLE_SDMA=0

Forces host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.

Impact will be reduced bandwidth but this is recommended when isolating issues with hardware copy engines.

"},{"location":"user-guide/gpu/#mpich_ofi_nic_policy","title":"MPICH_OFI_NIC_POLICY","text":"

For GPU-enabled parallel applications that involve MPI operations that access application arrays that are resident on GPU-attached memory regions users can set,

export MPICH_OFI_NIC_POLICY=GPU

In this case, for each MPI process, Cray MPI aims to select a NIC device that is closest to the GPU device being used.

"},{"location":"user-guide/gpu/#mpich_ofi_nic_verbose","title":"MPICH_OFI_NIC_VERBOSE","text":"

To display information pertaining to NIC selection set,

export MPICH_OFI_NIC_VERBOSE=2

"},{"location":"user-guide/gpu/#debugging","title":"Debugging","text":"

Note

Work in progress

Documentation for rocgdb can be found in the following locations:

https://rocm.docs.amd.com/projects/ROCgdb/en/docs-5.2.3/index.html

https://docs.amd.com/projects/HIP/en/docs-5.2.3/how_to_guides/debugging.html#using-rocgdb

"},{"location":"user-guide/gpu/#profiling","title":"Profiling","text":"

An initial profiling capability is provided via rocprof which is part of the rocm module.

For example in an interactive session where resources have already been allocated you can call,

srun -n 2 --exclusive --nodes=1 --time=00:20:00 --partition=gpu --qos=gpu-exc --gpus=2 rocprof --stats ./myprog_exe\n

to profile your application. More detail on the use of rocprof can be found here.

"},{"location":"user-guide/gpu/#performance-tuning","title":"Performance tuning","text":"

AMD provides some documentation on performance tuning here not all options will be available to users, so be aware that mileage may vary.

"},{"location":"user-guide/gpu/#hardware-details","title":"Hardware details","text":"

The specifications of the GPU hardware can be found here.

Additionally you can use the command,

rocminfo

in job on a GPU node to print information about the GPUs and CPU on the node. This command is provided as part of the rocm module.

"},{"location":"user-guide/gpu/#node-topology","title":"Node Topology","text":"

Using rocm-smi --showtopo we can learn about the connections between the GPUs in a node and the how memory regions between the GPU and CPU are connected.

======================= ROCm System Management Interface =======================\n=========================== Weight between two GPUs ============================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            15           15           15\nGPU1   15           0            15           15\nGPU2   15           15           0            15\nGPU3   15           15           15           0\n\n============================ Hops between two GPUs =============================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            1            1            1\nGPU1   1            0            1            1\nGPU2   1            1            0            1\nGPU3   1            1            1            0\n\n========================== Link Type between two GPUs ==========================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            XGMI         XGMI         XGMI\nGPU1   XGMI         0            XGMI         XGMI\nGPU2   XGMI         XGMI         0            XGMI\nGPU3   XGMI         XGMI         XGMI         0\n\n================================== Numa Nodes ==================================\nGPU 0          : (Topology) Numa Node: 0\nGPU 0          : (Topology) Numa Affinity: 0\nGPU 1          : (Topology) Numa Node: 1\nGPU 1          : (Topology) Numa Affinity: 1\nGPU 2          : (Topology) Numa Node: 2\nGPU 2          : (Topology) Numa Affinity: 2\nGPU 3          : (Topology) Numa Node: 3\nGPU 3          : (Topology) Numa Affinity: 3\n============================= End of ROCm SMI Log ==============================\n

To quote the rocm documentation:

- The first block of the output shows the distance between the GPUs similar to what the numactl command outputs for the NUMA domains of a system. The weight is a qualitative measure for the \u201cdistance\u201d data must travel to reach one GPU from another one. While the values do not carry a special (physical) meaning, the higher the value the more hops are needed to reach the destination from the source GPU.\n\n- The second block has a matrix named \u201cHops between two GPUs\u201d, where 1 means the two GPUs are directly connected with XGMI, 2 means both GPUs are linked to the same CPU socket and GPU communications will go through the CPU, and 3 means both GPUs are linked to different CPU sockets so communications will go through both CPU sockets. This number is one for all GPUs in this case since they are all connected to each other through the Infinity Fabric links.\n\n- The third block outputs the link types between the GPUs. This can either be \u201cXGMI\u201d for AMD Infinity Fabric links or \u201cPCIE\u201d for PCIe Gen4 links.\n\n- The fourth block reveals the localization of a GPU with respect to the NUMA organization of the shared memory of the AMD EPYC processors.\n
"},{"location":"user-guide/gpu/#rocm-bandwidth-test","title":"rocm-bandwidth-test","text":"

As part of the rocm module the rocm-bandwidth-test is provided that can be used to measure the performance of communications between the hardware in a node.

In addition to rocm-smi this is a bandwidth test that can be useful in understanding the composition and performance limitations in a GPU node. Here is an example output from a GPU nodes on ARCHER2.

Device: 0,  AMD EPYC 7543P 32-Core Processor\nDevice: 1,  AMD EPYC 7543P 32-Core Processor\nDevice: 2,  AMD EPYC 7543P 32-Core Processor\nDevice: 3,  AMD EPYC 7543P 32-Core Processor\nDevice: 4,  ,  GPU-ab43b63dec8adaf3,  c9:0.0\nDevice: 5,  ,  GPU-0b953cf8e6d4184a,  87:0.0\nDevice: 6,  ,  GPU-b0266df54d0dd2e1,  49:0.0\nDevice: 7,  ,  GPU-790a09bfbf673859,  09:0.0\n\nInter-Device Access\n\nD/D       0         1         2         3         4         5         6         7\n\n0         1         1         1         1         1         1         1         1\n\n1         1         1         1         1         1         1         1         1\n\n2         1         1         1         1         1         1         1         1\n\n3         1         1         1         1         1         1         1         1\n\n4         1         1         1         1         1         1         1         1\n\n5         1         1         1         1         1         1         1         1\n\n6         1         1         1         1         1         1         1         1\n\n7         1         1         1         1         1         1         1         1\n\n\nInter-Device Numa Distance\n\nD/D       0         1         2         3         4         5         6         7\n\n0         0         12        12        12        20        32        32        32\n\n1         12        0         12        12        32        20        32        32\n\n2         12        12        0         12        32        32        20        32\n\n3         12        12        12        0         32        32        32        20\n\n4         20        32        32        32        0         15        15        15\n\n5         32        20        32        32        15        0         15        15\n\n6         32        32        20        32        15        15        0         15\n\n7         32        32        32        20        15        15        15        0\n\n\nUnidirectional copy peak bandwidth GB/s\n\nD/D       0           1           2           3           4           5           6           7\n\n0         N/A         N/A         N/A         N/A         26.977      26.977      26.977      26.977\n\n1         N/A         N/A         N/A         N/A         26.977      26.975      26.975      26.975\n\n2         N/A         N/A         N/A         N/A         26.977      26.977      26.975      26.975\n\n3         N/A         N/A         N/A         N/A         26.975      26.977      26.975      26.977\n\n4         28.169      28.171      28.169      28.169      1033.080    42.239      42.112      42.264\n\n5         28.169      28.169      28.169      28.169      42.243      1033.088    42.294      42.286\n\n6         28.169      28.171      28.167      28.169      42.158      42.281      1043.367    42.277\n\n7         28.171      28.169      28.169      28.169      42.226      42.264      42.264      1051.212\n\n\nBidirectional copy peak bandwidth GB/s\n\nD/D       0           1           2           3           4           5           6           7\n\n0         N/A         N/A         N/A         N/A         40.480      42.528      42.059      42.173\n\n1         N/A         N/A         N/A         N/A         41.604      41.826      41.903      41.417\n\n2         N/A         N/A         N/A         N/A         41.008      41.499      41.258      41.338\n\n3         N/A         N/A         N/A         N/A         40.968      41.273      40.982      41.450\n\n4         40.480      41.604      41.008      40.968      N/A         80.946      80.631      80.888\n\n5         42.528      41.826      41.499      41.273      80.946      N/A         80.944      80.940\n\n6         42.059      41.903      41.258      40.982      80.631      80.944      N/A         80.896\n\n7         42.173      41.417      41.338      41.450      80.888      80.940      80.896      N/A\n
"},{"location":"user-guide/gpu/#tools","title":"Tools","text":""},{"location":"user-guide/gpu/#rocm-smi","title":"rocm-smi","text":"

If you load the rocm module on the system you will have access to the rocm-smi utility. This utility allows users to report information about the GPUs on node and can be very useful in better understanding the set up of the hardware you are working with and monitoring GPU metrics during job execution.

Here are some useful commands to get you started:

rocm-smi --alldevices device status

======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    28.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n1    30.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n2    33.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n3    33.0c  41.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n
This shows you the current state of the hardware while an application is running.

Focusing on the GPU activity can be useful to understand when your code is active on the GPUs:

rocm-smi --showuse GPU activity

======================= ROCm System Management Interface =======================\n============================== % time GPU is busy ==============================\nGPU[0]          : GPU use (%): 0\nGPU[0]          : GFX Activity: 705759841\nGPU[1]          : GPU use (%): 0\nGPU[1]          : GFX Activity: 664257322\nGPU[2]          : GPU use (%): 0\nGPU[2]          : GFX Activity: 660987914\nGPU[3]          : GPU use (%): 0\nGPU[3]          : GFX Activity: 665049119\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

Additionally you can focus on the memory use of the GPUs:

rocm-smi --showmemuse GPU memory currently consumed

======================= ROCm System Management Interface =======================\n============================== Current Memory Use ==============================\nGPU[0]          : GPU memory use (%): 0\nGPU[0]          : Memory Activity: 323631375\nGPU[1]          : GPU memory use (%): 0\nGPU[1]          : Memory Activity: 319196585\nGPU[2]          : GPU memory use (%): 0\nGPU[2]          : Memory Activity: 318641690\nGPU[3]          : GPU memory use (%): 0\nGPU[3]          : Memory Activity: 319854295\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

More commands can be found by running,

rocm-smi --help

will run on the login nodes to get more information about probing the GPUs.

More detail can be found at here.

"},{"location":"user-guide/gpu/#hipify","title":"HIPIFY","text":"

HIPIFY is a CUDA to HIP source translator tool that can allow CUDA source code to be translated into HIP source code, easing the transition between the two hardware targets.

The tool is available on ARCHER2 by loading the rocm module.

The github repository for HIPIFY can be found here.

The documentation for HIPIFY is found here.

"},{"location":"user-guide/gpu/#notes-and-useful-links","title":"Notes and useful links","text":"

You should expect the software development environment to be similar to that available on the Frontier exascale system:

"},{"location":"user-guide/hardware/","title":"ARCHER2 hardware","text":"

Note

Some of the material in this section is closely based on information provided by NASA as part of the documentation for the Aitkin HPC system.

"},{"location":"user-guide/hardware/#system-overview","title":"System overview","text":"

ARCHER2 is a HPE Cray EX supercomputing system which has a total of 5,860 compute nodes. Each compute node has 128 cores (dual AMD EPYC 7742 64-core 2.25GHz processors) giving a total of 750,080 cores. Compute nodes are connected together by a HPE Slingshot interconnect.

There are additional User Access Nodes (UAN, also called login nodes), which provide access to the system, and data-analysis nodes, which are well-suited for preparation of job inputs and analysis of job outputs.

Compute nodes are only accessible via the Slurm job scheduling system.

There are two storage types: home and work. Home is available on login nodes and data-analysis nodes. Work is available on login, data-analysis nodes and compute nodes (see I/O and file systems).

This is shown in the ARCHER2 architecture diagram:

The home file system is provided by dual NetApp FAS8200A systems (one primary and one disaster recovery) with a capacity of 1 PB each.

The work file system consists of four separate HPE Cray L300 storage systems, each with a capacity of 3.6 PB. The interconnect uses a dragonfly topology, and has a bandwidth of 100 Gbps.

The system also includes 1.1 PB burst buffer NVMe storage, provided by an HPE Cray E1000.

"},{"location":"user-guide/hardware/#compute-node-overview","title":"Compute node overview","text":"

The compute nodes each have 128 cores. They are dual socket nodes with two 64-core AMD EPYC 7742 processors. There are 5,276 standard memory nodes and 584 high memory nodes.

Note

Note due to Simultaneous Multi-Threading (SMT) each core has 2 threads, therefore a node has 128 cores / 256 threads. Most users will not want to use SMT, see Launching parallel jobs.

Component Details Processor 2x AMD Zen2 (Rome) EPYC 7742, 64-core, 2.25 Ghz Cores per node 128 NUMA structure 8 NUMA regions per node (16 cores per NUMA region) Memory per node 256 GB (standard), 512 GB (high memory) Memory per core 2 GB (standard), 4 GB (high memory) L1 cache 32 kB/core L2 cache 512 kB/core L3 cache 16 MB/4-cores Vector support AVX2 Network connection 2x 100 Gb/s injection ports per node

Each socket contains eight Core Complex Dies (CCDs) and one I/O die (IOD). Each CCD contains two Core Complexes (CCXs). Each CCX has 4 cores and 16 MB of L3 cache. Thus, there are 64 cores per socket and 128 cores per node.

More information on the architecture of the AMD EPYC Zen2 processors:

"},{"location":"user-guide/hardware/#amd-zen2-microarchitecture","title":"AMD Zen2 microarchitecture","text":"

The AMD EPYC 7742 Rome processor has a base CPU clock of 2.25 GHz and a maximum boost clock of 3.4 GHz. There are eight processor dies (CCDs) with a total of 64 cores per socket.

Tip

The processors can only access their boost frequencies if the CPU frequency is set to 2.25 GHz. See the documentation on setting CPU frequency for information on how to select the correct CPU frequency.

Note

When all 128 compute cores on a node are loaded with computationally intensive work, we typically see the processor clock frequency boost to around 2.8 GHz.

Hybrid multi-die design:

Within each socket, the eight processor dies are fabricated on a 7 nanometer (nm) process, while the I/O die is fabricated on a 14 nm process. This design decision was made because the processor dies need the leading edge (and more expensive) 7 nm technology in order to reduce the amount of power and space needed to double the number of cores, and to add more cache, compared to the first-generation EPYC processors. The I/O die retains the less expensive, older 14 nm technology.

2nd-generation Infinity Fabric technology:

Infinity Fabric technology is used for communication among different components throughout the node: within cores, between cores, between core complexes (CCX) in a core complex die (CCD), among CCDs in a socket, to the main memory and PCIe, and between the two sockets. The Rome processors are the first x86 systems to support 4th-generation PCIe, which delivers twice the I/O performance (to the Slingshot interconnect, storage, NVMe SSD, etc.) compared to 3rd-generation PCIe.

"},{"location":"user-guide/hardware/#processor-hierarchy","title":"Processor hierarchy","text":"

The Zen2 processor hierarchy is as follows:

CPU core

AMD 7742 is a 64-bit x86 server microprocessor. A partial list of instructions and features supported in Rome includes SSE, SSE2, SSE3, SSSE3, SSE4a, SSE4.1, SSE4.2, AES, FMA, AVX, AVX2 (256 bit), Integrated x87 FPU (FPU), Multi-Precision Add-Carry (ADX), 16-bit Floating Point Conversion (F16C), and No-eXecute (NX). For a complete list, run cat /proc/cpuinfo on the ARCHER2 login nodes.

Each core:

"},{"location":"user-guide/hardware/#cache-hierarchy","title":"Cache hierarchy","text":"

The cache hierarchy is as follows:

Note

With the write-back policy, data is updated in the current level cache first. The update in the next level storage is done later when the cache line is ready to be replaced.

Note

If a core misses in its local L2 and also in the L3, the shadow tags are consulted. If the shadow tag indicates that the data resides in another L2 within the CCX, a cache-to-cache transfer is initiated. 1 x 256 bits/cycle load bandwidth to L2 of each core; 1 x 256 bits/cycle store bandwidth from L2 of each core; write-back policy; populated by L2 victims.

"},{"location":"user-guide/hardware/#intra-socket-interconnect","title":"Intra-socket interconnect","text":"

The Infinity Fabric, evolved from AMD's previous generation HyperTransport interconnect, is a software-defined, scalable, coherent, and high-performance fabric. It uses sensors embedded in each die to scale control (Scalable Control Fabric, or SCF) and data flow (Scalable Data Fabric, or SDF).

"},{"location":"user-guide/hardware/#inter-socket-interconnect","title":"Inter-socket interconnect","text":"

Two EPYC 7742 SoCs are interconnected via Socket to Socket Global Memory Interconnect (xGMI) links, part of the Infinity Fabric that connects all the components of the SoC together. On ARCHER2 compute nodes there are 3 xGMI links using a total of 48 PCIe lanes. With the xGMI link speed set at 16 GT/s, the theoretical throughput for each direction is 96 GB/s (3 links x 16 GT/s x 2 bytes/transfer) without factoring in the encoding for xGMI, since there is no publication from AMD available. However, the expected efficiencies are 66\u201375%, so the sustained bandwidth per direction will be 63.5\u201372 GB/s. xGMI Dynamic Link Width Management saves power during periods of low socket-to-socket data traffic by reducing the number of active xGMI lanes per link from 16 to 8.

"},{"location":"user-guide/hardware/#memory-subsystem","title":"Memory subsystem","text":"

The Zen 2 microarchitecture places eight unified memory controllers in the centralized I/O die. The memory channels can be split into one, two, or four Non-Uniform Memory Access (NUMA) Nodes per Socket (NPS1, NPS2, and NPS4). ARCHER2 compute nodes are configured as NPS4, which is the highest memory bandwidth configuration geared toward HPC applications.

With eight 3,200 MHz memory channels, an 8-byte read or write operation taking place per cycle per channel results in a maximum total memory bandwidth of 204.8 GB/s per socket.

Each memory channel can be connected with up to two Double Data Rate (DDR) fourth-generation Dual In-line Memory Modules (DIMMs). On ARCHER2 standard memory nodes, each channel is connected to a single 16 GB DDR4 registered DIMM (RDIMM) with error correcting code (ECC) support leading to 128 GB per socket and 256 GB per node. For the high memory nodes, each channel is connected to a single 32 GB DDR4 registered DIMM (RDIMM) with error correcting code (ECC) support leading to 256 GB per socket and 512 GB per node.

"},{"location":"user-guide/hardware/#interconnect-details","title":"Interconnect details","text":"

ARCHER2 has a HPE Slingshot interconnect with 200 Gb/s signalling per node. It uses a dragonfly topology:

"},{"location":"user-guide/hardware/#storage-details","title":"Storage details","text":"

Information on the ARCHER2 parallel Lustre file systems and how to get best performance is available in the IO section.

"},{"location":"user-guide/io/","title":"I/O performance and tuning","text":"

This section describes common IO patterns, best practice for I/O and how to get good performance on the ARCHER2 storage.

Information on the file systems, directory layouts, quotas, archiving and transferring data can be found in the Data management and transfer section.

The advice here is targeted at use of the parallel file systems available on the compute nodes on ARCHER2 (i.e. Not the home and RDFaaS file systems).

"},{"location":"user-guide/io/#common-io-patterns","title":"Common I/O patterns","text":"

There are number of I/O patterns that are frequently used in parallel applications:

"},{"location":"user-guide/io/#single-file-single-writer-serial-io","title":"Single file, single writer (Serial I/O)","text":"

A common approach is to funnel all the I/O through one controller process (e.g. rank 0 in an MPI program). Although this has the advantage of producing a single file, the fact that only one client is doing all the I/O means that it gains little benefit from the parallel file system. In practice this severely limits the I/O rates, e.g. when writing large files the speed is not likely to significantly exceed 1 GB/s.

"},{"location":"user-guide/io/#file-per-process-fpp","title":"File-per-process (FPP)","text":"

One of the first parallel strategies people use for I/O is for each parallel process to write to its own file. This is a simple scheme to implement and understand and can achieve high bandwidth as, with many I/O clients active at once, it benefits from the parallel Lustre filesystem. However, it has the distinct disadvantage that the data is spread across many different files and may therefore be very difficult to use for further analysis without a data reconstruction stage to recombine potentially thousands of small files.

In addition, having thousands of files open at once can overload the filesystem and lead to poor performance.

Tip

The ARCHER2 solid state file system can give very high performance when using this model of I/O

The ADIOS 2 I/O library uses an approach similar to file-per-process and so can achieve very good performance on modern parallel file systems.

"},{"location":"user-guide/io/#file-per-node-fpn","title":"File-per-node (FPN)","text":"

A simple way to reduce the sheer number of files is to write a file per node rather than a file per process; as ARCHER2 has 128 CPU-cores per node, this can reduce the number of files by more than a factor of 100 and should not significantly affect the I/O rates. However, it still produces multiple files which can be hard to work with in practice.

"},{"location":"user-guide/io/#single-file-multiple-writers-without-collective-operations","title":"Single file, multiple writers without collective operations","text":"

All aspects of data management are simpler if your parallel program produces a single file in the same format as a serial code, e.g. analysis or program restart are much more straightforward.

There are a number of ways to achieve this. For example, many processes can open the same file but access different parts by skipping some initial offset, although this is problematic when writing as locking may be needed to ensure consistency. Parallel I/O libraries such as MPI-IO, HDF5 and NetCDF allow for this form of access and will implement locking automatically.

The problem is that, with many clients all individually accessing the same file, there can be a lot of contention for file system resources, leading to poor I/O rates. When writing, file locking can effectively serialise the access and there is no benefit from the parallel filesystem.

"},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf","title":"Single Shared File with collective writes (SSF)","text":"

The problem with having many clients performing I/O at the same time is that the I/O library may have to restrict access to one client at a time by locking. However if I/O is done collectively, where the library knows that all clients are doing I/O at the same time, then reads and writes can be explicitly coordinated to avoid clashes and no locking is required.

It is only through collective I/O that the full bandwidth of the file system can be realised while accessing a single file. Whatever I/O library you are using, it is essential to use collective forms of the read and write calls to achieve good performance.

"},{"location":"user-guide/io/#achieving-efficient-io","title":"Achieving efficient I/O","text":"

This section provides information on getting the best performance out of the parallel /work file systems on ARCHER2 when writing data, particularly using parallel I/O patterns.

"},{"location":"user-guide/io/#lustre-technology","title":"Lustre technology","text":"

The ARCHER2 /work file systems use Lustre as a parallel file system technology. It has many disk units (called Object Storage Targets or OSTs), all under the control of a single Meta Data Server (MDS) so that it appears to the user as a single file system. The Lustre file system provides POSIX semantics (changes on one node are immediately visible on other nodes) and can support very high data rates for appropriate I/O patterns.

In order to achieve good performance on the ARCHER2 Lustre file systems, you need to make sure your IO is configured correctly for the type of I/O you want to do. In the following sections we describe how to do this.

"},{"location":"user-guide/io/#summary-achieving-best-io-performance","title":"Summary: achieving best I/O performance","text":"

The configuration you should use depends on the type of I/O you are performing. Here, we summarise the settings for two of the I/O patterns described above: File-Per-Process (FPP, including using ADIOS2) and Single Share File with collective writes (SSF).

Following sections describe the settings in more detail.

"},{"location":"user-guide/io/#file-per-process-fpp_1","title":"File-Per-Process (FPP)","text":""},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf_1","title":"Single Shared File with collective writes (SSF)","text":""},{"location":"user-guide/io/#summary-typical-io-performance-on-archer2","title":"Summary: typical I/O performance on ARCHER2","text":""},{"location":"user-guide/io/#file-per-process-fpp_2","title":"File-Per-Process (FPP)","text":"

We regularly run tests of FPP write performance on ARCHER2 `/work`` Lustre file systems using the benchio software in the following configuration:

Typical write performance:

"},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf_2","title":"Single Shared File with collective writes (SSF)","text":"

We regularly run tests of FPP write performance on ARCHER2 `/work`` Lustre file systems using the benchio software in the following configuration:

Typical write performance:

"},{"location":"user-guide/io/#striping","title":"Striping","text":"

One of the main factors leading to the high performance of Lustre file systems is the ability to store data on multiple OSTs. For many small files, this is achieved by storing different files on different OSTs; large files must be striped across multiple OSTs to benefit from the parallel nature of Lustre.

When a file is striped it is split into chunks and stored across multiple OSTs in a round-robin fashion. Striping can improve the I/O performance because it increases the available bandwidth: multiple processes can read and write the same file simultaneously by accessing different OSTs. However striping can also increase the overhead. Choosing the right striping configuration is key to obtain high performance results.

Users have control of a number of striping settings on Lustre file systems. Although these parameters can be set on a per-file basis they are usually set on the directory where your output files will be written so that all output files inherit the same settings.

"},{"location":"user-guide/io/#default-configuration","title":"Default configuration","text":"

The /work file systems on ARCHER2 have the same default stripe settings:

These settings have been chosen to provide a good compromise for the wide variety of I/O patterns that are seen on the system but are unlikely to be optimal for any one particular scenario. The Lustre command to query the stripe settings for a directory (or file) is lfs getstripe. For example, to query the stripe settings of an already created directory resdir:

auser@ln03:~> lfs getstripe resdir/\nresdir\nstripe_count:   1 stripe_size:    1048576 stripe_offset:  -1\n
"},{"location":"user-guide/io/#setting-custom-striping-configurations","title":"Setting custom striping configurations","text":"

Users can set stripe settings for a directory (or file) using the lfs setstripe command. The options for lfs setstripe are:

For example, to set a stripe size of 4 MiB for the existing directory resdir, along with maximum striping count you would use:

auser@ln03:~> lfs setstripe -S 4m -c -1 resdir/\n
"},{"location":"user-guide/io/#environment-variables","title":"Environment variables","text":"

The following environment variables typically only have an impact for the case when you using Single Shared Files with collective communications. As mentioned above, it is very important to use collective calls when doing parallel I/O to a single shared file.

However, with the default settings, parallel I/O on multiple nodes can currently give poor performance. We recommend always setting these environment variables in your SLURM batch script when you are using the SSF I/O pattern:

export FI_OFI_RXM_SAR_LIMIT=64K\nexport MPICH_MPIIO_HINTS=\"*:cray_cb_write_lock_mode=2,*:cray_cb_nodes_multiplier=4\u201d\n
"},{"location":"user-guide/io/#mpi-transport-protocol","title":"MPI transport protocol","text":"

Setting the environment variables described above can improve the performance of MPI collectives when handling large amounts of data, which in turn can improve collective file I/O. An alternative is to use the non-default UCX implementation of the MPI library as an alternative to the default OFI version.

To switch library version see the Application Development Environment section of the User Guide.

Note

This will affect all your MPI calls, not just those related to I/O, so you should check the overall performance of your program before and after the switch. It is possible that other functions may run slower even if the I/O performance improves.

"},{"location":"user-guide/io/#io-profiling","title":"I/O profiling","text":"

If you are concerned about your I/O performance, you should quantify your transfer rates in terms of GB/s of data read or written to disk. Small files can achieve very high I/O rates due to data caching in Lustre. However, for large files you should be able to achieve a maximum of around 1 GB/s for an unstriped file, or up to 10 GB/s for a fully striped file (across all 12 OSTs).

Warning

You share /work with all other users so I/O rates can be very variable, especially if the machine is heavily loaded.

If your I/O rates are poor then you can get useful summary information about how the parallel libraries are performing by setting this variable in your Slurm script

export MPICH_MPIIO_STATS=1\n

Amongst other things, this will give you information on how many independent and collective I/O operations were issued. If you see a large number of independent operations compared to collectives, this indicates that you have inefficient I/O patterns and you should check that you are calling your parallel I/O library correctly.

Although this information comes from the MPI library, it is still useful for users of higher-level libraries such as HDF5 as they all call MPI-IO at the lowest level.

"},{"location":"user-guide/io/#tips-and-advice-for-io","title":"Tips and advice for I/O","text":""},{"location":"user-guide/io/#set-an-optimum-blocksize-when-untaring-data","title":"Set an optimum blocksize when untar'ing data","text":"

When you are expanding a large tar archive file to the Lustre file systems you should specify the -b 2048 option to ensure that tar writes out data in blocks of 1 MiB. This will improve the performance of your tar command and reduce the impact of writing the data to Lustre on other users.

"},{"location":"user-guide/machine-learning/","title":"Machine Learning","text":"

Two Machine Learning (ML) frameworks are supported on ARCHER2, PyTorch and TensorFlow.

For each framework, we'll show how to run a particular MLCommons HPC benchmark. We start with PyTorch.

"},{"location":"user-guide/machine-learning/#pytorch","title":"PyTorch","text":"

On ARCHER2, PyTorch is supported for use on both the CPU and GPU nodes.

We'll demonstrate the use of PyTorch with DeepCam, a deep learning climate segmentation benchmark. It involves training a neural network to recognise large-scale weather phenomena (e.g., tropical cyclones, atmospheric rivers) in the output generated by ensembles of weather simulations, see link below for more details.

Exascale Deep Learning for Climate Analytics

There are two DeepCam training datasets available on ARCHER2. A 62 GB mini dataset (/work/z19/shared/mlperf-hpc/deepcam/mini), and a much larger 8.9 TB dataset (/work/z19/shared/mlperf-hpc/deepcam/full).

"},{"location":"user-guide/machine-learning/#deepcam-on-gpu","title":"DeepCam on GPU","text":"

A binary install of PyTorch 1.13.1 suitable for ROCm 5.2.3 has been installed according to the instructions linked below.

https://github.com/hpc-uk/build-instructions/blob/main/pyenvs/pytorch/build_pytorch_1.13.1_archer2_gpu.md

This install can be accessed by loading the pytorch/1.13.1-gpu module.

As DeepCam is an MLPerf benchmark, you may wish to base a local python environment on pytorch/1.13.1-gpu so that you have the opportunity to install additional python packages that support MLPerf logging, as well as extra features pertinent to DeepCam (e.g., dynamic learning rates).

The following instructions show how to create such an environment.

#!/bin/bash\n\nmodule -q load pytorch/1.13.1-gpu\n\nPYTHON_TAG=python`echo ${CRAY_PYTHON_LEVEL} | cut -d. -f1-2`\n\nPRFX=${HOME/home/work}/pyenvs\nPYVENV_ROOT=${PRFX}/mlperf-pt-gpu\nPYVENV_SITEPKGS=${PYVENV_ROOT}/lib/${PYTHON_TAG}/site-packages\n\nmkdir -p ${PYVENV_ROOT}\ncd ${PYVENV_ROOT}\n\n\npython -m venv --system-site-packages ${PYVENV_ROOT}\n\nextend-venv-activate ${PYVENV_ROOT}\n\nsource ${PYVENV_ROOT}/bin/activate\n\n\nmkdir -p ${PYVENV_ROOT}/repos\ncd ${PYVENV_ROOT}/repos\n\ngit clone -b hpc-1.0-branch https://github.com/mlcommons/logging mlperf-logging\npython -m pip install -e mlperf-logging\n\nrm ${PYVENV_SITEPKGS}/mlperf-logging.egg-link\nmv ./mlperf-logging/mlperf_logging ${PYVENV_SITEPKGS}/\nmv ./mlperf-logging/mlperf_logging.egg-info ${PYVENV_SITEPKGS}/\n\npython -m pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git\n\ndeactivate\n

In order to run a DeepCam training job, you must first clone the MLCommons HPC github repo.

mkdir ${HOME/home/work}/tests\ncd ${HOME/home/work}/tests\n\ngit clone https://github.com/mlcommons/hpc.git mlperf-hpc\n\ncd ./mlperf-hpc/deepcam/src/deepCam\n

You are now ready to run the following DeepCam submission script via the sbatch command.

#!/bin/bash\n\n#SBATCH --job-name=deepcam\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n#SBATCH --nodes=2\n#SBATCH --gpus=8\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n\n\nJOB_OUTPUT_PATH=./results/${SLURM_JOB_ID}\nmkdir -p ${JOB_OUTPUT_PATH}/logs\n\nsource ${HOME/home/work}/pyenvs/mlperf-pt-gpu/bin/activate\n\nexport OMP_NUM_THREADS=1\nexport HOME=${HOME/home/work}\n\nsrun --ntasks=8 --tasks-per-node=4 \\\n     --cpu-bind=verbose,map_cpu:0,8,16,24 --hint=nomultithread \\\n     python train.py \\\n         --run_tag test \\\n         --data_dir_prefix /work/z19/shared/mlperf-hpc/deepcam/mini \\\n         --output_dir ${JOB_OUTPUT_PATH} \\\n     --wireup_method nccl-slurm \\\n     --max_epochs 64 \\\n     --local_batch_size 1\n\nmv slurm-${SLURM_JOB_ID}.out ${JOB_OUTPUT_PATH}/slurm.out\n

The job submission script activates the python environment that was setup earlier, but that particular command (source ${HOME/home/work}/pyenvs/mlperf-pt-gpu/bin/activate) could be replaced by module -q load pytorch/1.13.1-gpu if you are not running DeepCam and have no need for additional Python packages such as mlperf-logging and warmup-scheduler.

In the script above, we specify four tasks per node, one for each GPU. These tasks are evenly spaced across the node so as to maximise the communications bandwidth between the host and the GPU devices. Note, PyTorch is not using Cray MPICH for inter-task communications, which is instead being handled by the ROCm Collective Communications Library (RCCL), hence the --wireup_method nccl-slurm option (nccl-slurm works as an alias for `rccl-slurm in this context).

The above job should achieve convergence \u2014 an Intersection over Union (IoU) of 0.82 \u2014 after 35 epochs or so. Runtime should be around 20-30 minutes.

We can also modify the DeepCam train.py script so that the accuracy and loss are logged using TensorBoard.

The following lines must be added to the DeepCam train.py script.

import os\n...\n\nfrom torch.utils.tensorboard import SummaryWriter\n\n...\n\ndef main(pargs):\n\n    #init distributed training\n    comm_local_group = comm.init(pargs.wireup_method, pargs.batchnorm_group_size)\n    comm_rank = comm.get_rank()\n    ...\n\n    #set up logging\n    pargs.logging_frequency = max([pargs.logging_frequency, 0])\n    log_file = os.path.normpath(os.path.join(pargs.output_dir, \"logs\", pargs.run_tag + \".log\"))\n    ...\n\n    writer = SummaryWriter()\n\n    #set seed\n    ...\n\n    ...\n\n    #training loop\n    while True:\n        ...\n\n        #training\n        step = train_epoch(pargs, comm_rank, comm_size,\n                           ...\n                           logger, writer)\n\n        ...\n

The train_epoch function is defined in ./driver/trainer.py and so that file must be amended like so.

...\n\ndef train_epoch(pargs, comm_rank, comm_size,\n                ...,\n                logger, writer):\n\n    ...\n\n    writer.add_scalar(\"Accuracy/train\", iou_avg_train, epoch+1)\n    writer.add_scalar(\"Loss/train\", loss_avg_train, epoch+1)\n\n    return step\n
"},{"location":"user-guide/machine-learning/#deepcam-on-cpu","title":"DeepCam on CPU","text":"

PyTorch can also be run on the ARCHER2 CPU nodes. However, since the DeepCam uses the torch.distributed module, we cannot use Horovod to handle (via MPI) inter-task communications. We must instead build PyTorch from source so that we can link torch.distributed to the correct Cray MPICH libraries.

The instructions for doing such a build can be found here, https://github.com/hpc-uk/build-instructions/blob/main/pyenvs/pytorch/build_pytorch_1.13.0a0_from_source_archer2_cpu.md.

This install can be accessed by loading the pytorch/1.13.0a0 module. Please note, PyTorch source version 1.13.0a0 corresponds to PyTorch package version 1.13.1.

Once again, as we are running the DeepCam benchmark, we'll need to setup a local Python environment for installing the MLPerf logging package. This time the local environment is based on the pytorch/1.13.0a0 module.

#!/bin/bash\n\nmodule -q load pytorch/1.13.0a0\n\nPYTHON_TAG=python`echo ${CRAY_PYTHON_LEVEL} | cut -d. -f1-2`\n\nPRFX=${HOME/home/work}/pyenvs\nPYVENV_ROOT=${PRFX}/mlperf-pt\nPYVENV_SITEPKGS=${PYVENV_ROOT}/lib/${PYTHON_TAG}/site-packages\n\nmkdir -p ${PYVENV_ROOT}\ncd ${PYVENV_ROOT}\n\n\npython -m venv --system-site-packages ${PYVENV_ROOT}\n\nextend-venv-activate ${PYVENV_ROOT}\n\nsource ${PYVENV_ROOT}/bin/activate\n\n\nmkdir -p ${PYVENV_ROOT}/repos\ncd ${PYVENV_ROOT}/repos\n\ngit clone -b hpc-1.0-branch https://github.com/mlcommons/logging mlperf-logging\npython -m pip install -e mlperf-logging\n\nrm ${PYVENV_SITEPKGS}/mlperf-logging.egg-link\nmv ./mlperf-logging/mlperf_logging ${PYVENV_SITEPKGS}/\nmv ./mlperf-logging/mlperf_logging.egg-info ${PYVENV_SITEPKGS}/\n\npython -m pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git\n\ndeactivate\n

DeepCam can now be run on the CPU nodes using a submission script like the one below.

#!/bin/bash\n\n#SBATCH --job-name=deepcam\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --nodes=32\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=128\n#SBATCH --time=10:00:00\n#SBATCH --exclusive\n\n\nJOB_OUTPUT_PATH=./results/${SLURM_JOB_ID}\nmkdir -p ${JOB_OUTPUT_PATH}/logs\n\nsource ${HOME/home/work}/pyenvs/mlperf-pt/bin/activate\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\nexport OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}\n\nsrun --hint=nomultithread \\\n     python train.py \\\n         --run_tag test \\\n         --data_dir_prefix /work/z19/shared/mlperf-hpc/deepcam/mini \\\n         --output_dir ${JOB_OUTPUT_PATH} \\\n         --wireup_method mpi \\\n         --max_inter_threads ${SLURM_CPUS_PER_TASK} \\\n         --max_epochs 64 \\\n         --local_batch_size 1\n\nmv slurm-${SLURM_JOB_ID}.out ${JOB_OUTPUT_PATH}/slurm.out\n

The script above activates the local Python environment so that the mlperf-logging package is available; this is needed by the logger object declared in the DeepCam train.py script. Notice also that the --wireup-method parameter is now set to mpi and that a new parameter has been added, --max_inter_threads, for specifying the maximum number of concurrent readers.

DeepCam performance on the CPU nodes is much slower than GPU. Running on 32 CPU nodes, as shown above, will take around 6 hours to complete 35 epochs. This assumes you're using the default hyperparameter settings for DeepCam.

"},{"location":"user-guide/machine-learning/#tensorflow","title":"TensorFlow","text":"

On ARCHER2, TensorFlow is supported for use on the CPU nodes only.

We'll demonstrate the use of TensorFlow with the CosmoFlow benchmark. It involves training a neural network to recognise cosmological parameter values from the output generated by 3D dark matter simulations, see link below for more details.

CosmoFlow: using deep learning to learn the universe at scale

There are two CosmoFlow training datasets available on ARCHER2. A 5.6 GB mini dataset (/work/z19/shared/mlperf-hpc/cosmoflow/mini), and a much larger 1.7 TB dataset (/work/z19/shared/mlperf-hpc/cosmoflow/full).

"},{"location":"user-guide/machine-learning/#cosmoflow-on-cpu","title":"CosmoFlow on CPU","text":"

In order to run a CosmoFlow training job, you must first clone the MLCommons HPC github repo.

mkdir ${HOME/home/work}/tests\ncd ${HOME/home/work}/tests\n\ngit clone https://github.com/mlcommons/hpc.git mlperf-hpc\n\ncd ./mlperf-hpc/cosmoflow\n

You are now ready to run the following CosmoFlow submission script via the sbatch command.

#!/bin/bash\n\n#SBATCH --job-name=cosmoflow\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --nodes=32\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n\nmodule -q load tensorflow/2.13.0\n\nexport UCX_MEMTYPE_CACHE=n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\nexport MPICH_DPM_DIR=${SLURM_SUBMIT_DIR}/dpmdir\n\nexport OMP_NUM_THREADS=16\nexport TF_ENABLE_ONEDNN_OPTS=1\n\nsrun  --hint=nomultithread --distribution=block:block --cpu-freq=2250000 \\\n    python train.py \\\n        --distributed --omp-num-threads ${OMP_NUM_THREADS} \\\n        --inter-threads 0 --intra-threads 0 \\\n        --n-epochs 2048 --n-train 1024 --n-valid 1024 \\\n        --data-dir /work/z19/shared/mlperf-hpc/cosmoflow/mini/cosmoUniverse_2019_05_4parE_tf_v2_mini\n

The CosmoFlow job runs eight MPI tasks per node (one per NUMA region) with sixteen threads per task, and so, each node is fully populated. The TF_ENABLE_ONEDNN_OPTS variable refers to Intel's oneAPI Deep Neural Network library. Within the TensorFlow source there are #ifdef guards that are activated when oneDNN is enabled. It turns out that having TF_ENABLE_ONEDNN_OPTS=1 also improves performance (by a factor of 12) on AMD processors.

The inter/intra thread training parameters allow one to exploit any parallelism implied by the TensorFlow (TF) DNN graph. For example, if a node in the TF graph can be parallelised, the number of threads assigned will be the value of --intra-threads; and, if there are separate nodes in the TF graph that can be run concurrently, the available thread count for such an activity is the value of --inter-threads. Of course, the optimum values for these parameters will depend on the DNN graph. The job script above tells TensorFlow to choose the values by setting both parameters to zero.

You will note that only a few hyperparameters are specified for the CosmoFlow training job (e.g., --n-epochs, --n-train and --n-valid). Those settings in fact override the values assigned to those same parameters within the ./configs/cosmo.yaml file. However, that file contains settings for many other hyperparameters that are not overwritten.

The CosmoFlow job specified above should take around 140 minutes to complete 2048 epochs, which should be sufficient to achieve a mean average error of 0.23.

"},{"location":"user-guide/profile/","title":"Profiling","text":"

There are a number of different ways to access profiling data on ARCHER2. In this section, we discuss the HPE Cray profiling tools, CrayPat-lite and CrayPat. We also show how to get usage data on currently running jobs from Slurm batch system.

You can also use the Linaro Forge tool to profile applications on ARCHER2.

If you are specifically interested in profiling IO, then you may want to look at the Darshan IO profiling tool.

"},{"location":"user-guide/profile/#craypat-lite","title":"CrayPat-lite","text":"

CrayPat-lite is a simplified and easy-to-use version of the Cray Performance Measurement and Analysis Tool (CrayPat). CrayPat-lite provides basic performance analysis information automatically, with a minimum of user interaction, and yet offers information useful to users wishing to explore a program's behaviour further using the full CrayPat suite.

"},{"location":"user-guide/profile/#how-to-use-craypat-lite","title":"How to use CrayPat-lite","text":"
  1. Ensure the perftools-base module is loaded.

    module list

  2. Load the perftools-lite module.

    module load perftools-lite

  3. Compile your application normally. An informational message from CrayPat-lite will appear indicating that the executable has been instrumented.

    cc -h std=c99  -o myapplication.x myapplication.c\n
    INFO: creating the CrayPat-instrumented executable 'myapplication.x' (lite-samples) ...OK  \n
  4. Run the generated executable normally by submitting a job.

    #!/bin/bash\n\n#SBATCH --job-name=CrayPat_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel program\nsrun --hint=nomultithread --distribution=block:block mpi_test.x\n
  5. Analyse the data.

    After the job finishes executing, CrayPat-lite output should be printed to stdout (i.e. at the end of the job's output file). A new directory will also be created containing .rpt and .ap2 files. The .rpt files are text files that contain the same information printed in the job's output file and the .ap2 files can be used to obtain more detailed information, which can be visualized using the Cray Apprentice2 tool.

"},{"location":"user-guide/profile/#further-help","title":"Further help","text":""},{"location":"user-guide/profile/#craypat","title":"CrayPat","text":"

The Cray Performance Analysis Tool (CrayPat) is a powerful framework for analysing a parallel application\u2019s performance on Cray supercomputers. It can provide very detailed information about the timing and performance of individual application procedures.

CrayPat can perform two types of performance analysis, sampling experiments and tracing experiments. A sampling experiment probes the code at a predefined interval and produces a report based on the data collected. A tracing experiment explicitly monitors the code performance within named routines. Typically, the overhead associated with a tracing experiment is higher than that associated with a sampling experiment but provides much more detailed information. The key to getting useful data out of a sampling experiment is to run your profiling for a representative length of time.

"},{"location":"user-guide/profile/#sampling-analysis","title":"Sampling analysis","text":"
  1. Ensure the perftools-base module is loaded.

    module list

  2. Load perftools module.

    module load perftools

  3. Compile your code in the standard way always using the Cray compiler wrappers (ftn, cc and CC). Object files need to be made available to CrayPat to correctly build an instrumented executable for profiling or tracing, this means that the compile and link stage should be separated by using the -c compile flag.

    auser@ln01:/work/t01/t01/auser> cc -h std=c99 -c jacobi.c\nauser@ln01:/work/t01/t01/auser> cc jacobi.o -o jacobi\n
  4. To instrument the binary, run the pat_build command. This will generate a new binary with +pat appended to the end (e.g. jacobi+pat).

    auser@ln:/work/t01/t01/auser> pat_build jacobi

  5. Run the new executable with +pat appended as you would with the regular executable. Each run will produce its own 'experiment directory' containing the performance data as .xf files inside a subdirectory called xf-files (e.g. running the jacobi+pat instrumented executable might produce jacobi+pat+12265-1573s/xf-files).

  6. Generate report data with pat_report.

The .xf files contain the raw sampling data from the run and need to be post-processed to produce useful results. This is done using the pat_report tool which converts all the raw data into a summarised and readable form. You should provide the name of the experiment directory as the argument to pat_report.

auser@ln:/work/t01/t01/auser> pat_report jacobi+pat+12265-1573s\n\nTable 1:  Profile by Function (limited entries shown)\n\nSamp% |  Samp |  Imb. |  Imb. | Group\n        |       |  Samp | Samp% |  Function\n        |       |       |       |   PE=HIDE\n100.0% | 849.5 |    -- |    -- | Total\n|--------------------------------------------------\n|  56.7% | 481.4 |    -- |    -- | MPI\n||-------------------------------------------------\n||  48.7% | 414.1 |  50.9 | 11.0% | MPI_Allreduce\n||   4.4% |  37.5 | 118.5 | 76.6% | MPI_Waitall\n||   3.0% |  25.2 |  44.8 | 64.5% | MPI_Isend\n||=================================================\n|  29.9% | 253.9 |  55.1 | 18.0% | USER\n||-------------------------------------------------\n||  29.9% | 253.9 |  55.1 | 18.0% | main\n||=================================================\n|  13.4% | 114.1 |    -- |    -- | ETC\n||-------------------------------------------------\n||  13.4% | 113.9 |  26.1 | 18.8% | __cray_memcpy_SNB\n|==================================================\n

This report will generate more files with the extension .ap2 in the experiment directory. These hold the same data as the .xf files but in the post-processed form. Another file produced has an .apa extension and is a text file with a suggested configuration for generating a traced experiment.

The .ap2 files generated are used to view performance data graphically with the Cray Apprentice2 tool.

The pat_report command is able to produce many different profile reports from the profiling data. You can select a predefined report with the -O flag to pat_report. A selection of the most generally useful predefined report types are:= listed below.

Example output:

auser@ln01:/work/t01/t01/auser> pat_report -O ca+src,load_balance  jacobi+pat+12265-1573s\n\nTable 1:  Profile by Function and Callers, with Line Numbers (limited entries shown)\n\nSamp% |  Samp |  Imb. |  Imb. | Group\n        |       |  Samp | Samp% |  Function\n        |       |       |       |   PE=HIDE\n100.0% | 849.5 |    -- |    -- | Total\n|--------------------------------------------------\n|--------------------------------------\n|  56.7% | 481.4 | MPI\n||-------------------------------------\n||  48.7% | 414.1 | MPI_Allreduce\n3|        |       |  main:jacobi.c:line.80\n||   4.4% |  37.5 | MPI_Waitall\n3|        |       |  main:jacobi.c:line.73\n||   3.0% |  25.2 | MPI_Isend\n|||------------------------------------\n3||   1.6% |  13.2 | main:jacobi.c:line.65\n3||   1.4% |  12.0 | main:jacobi.c:line.69\n||=====================================\n|  29.9% | 253.9 | USER\n||-------------------------------------\n||  29.9% | 253.9 | main\n|||------------------------------------\n3||  18.7% | 159.0 | main:jacobi.c:line.76\n3||   9.1% |  76.9 | main:jacobi.c:line.84\n|||====================================\n||=====================================\n|  13.4% | 114.1 | ETC\n||-------------------------------------\n||  13.4% | 113.9 | __cray_memcpy_SNB\n3|        |       |  __cray_memcpy_SNB\n|======================================\n
"},{"location":"user-guide/profile/#tracing-analysis","title":"Tracing analysis","text":""},{"location":"user-guide/profile/#automatic-program-analysis-apa","title":"Automatic Program Analysis (APA)","text":"

We can produce a focused tracing experiment based on the results from the sampling experiment using pat_build with the .apa file produced during the sampling.

auser@ln01:/work/t01/t01/auser> pat_build -O jacobi+pat+12265-1573s/build-options.apa\n

This will produce a third binary with extension +apa. This binary should once again be run on the compute nodes and the name of the executable changed to jacobi+apa. As with the sampling analysis, a report can be produced using pat_report. For example:

auser@ln01:/work/t01/t01/auser> pat_report jacobi+apa+13955-1573t\n\nTable 1:  Profile by Function Group and Function (limited entries shown)\n\nTime% |      Time |     Imb. |  Imb. |       Calls | Group\n        |           |     Time | Time% |             |  Function\n        |           |          |       |             |   PE=HIDE\n\n100.0% | 12.987762 |       -- |    -- | 1,387,544.9 | Total\n|-------------------------------------------------------------------------\n|  44.9% |  5.831320 |       -- |    -- |         2.0 | USER\n||------------------------------------------------------------------------\n||  44.9% |  5.831229 | 0.398671 |  6.4% |         1.0 | main\n||========================================================================\n|  29.2% |  3.789904 |       -- |    -- |   199,111.0 | MPI_SYNC\n||------------------------------------------------------------------------\n||  29.2% |  3.789115 | 1.792050 | 47.3% |   199,109.0 | MPI_Allreduce(sync)\n||========================================================================\n|  25.9% |  3.366537 |       -- |    -- | 1,188,431.9 | MPI\n||------------------------------------------------------------------------\n||  18.0% |  2.334765 | 0.164646 |  6.6% |   199,109.0 | MPI_Allreduce\n||   3.7% |  0.486714 | 0.882654 | 65.0% |   199,108.0 | MPI_Waitall\n||   3.3% |  0.428731 | 0.557342 | 57.0% |   395,104.9 | MPI_Isend\n|=========================================================================\n
"},{"location":"user-guide/profile/#manual-program-analysis","title":"Manual Program Analysis","text":"

CrayPat allows you to manually choose your profiling preference. This is particularly useful if the APA mode does not meet your tracing analysis requirements.

The entire program can be traced as a whole using -w:

auser@ln01:/work/t01/t01/auser> pat_build -w jacobi\n

Using -g, a program can be instrumented to trace all function entry point references belonging to the trace function group (mpi, libsci, lapack, scalapack, heap, etc):

auser@ln01:/work/t01/t01/auser> pat_build -w -g mpi jacobi\n
"},{"location":"user-guide/profile/#dynamically-linked-binaries","title":"Dynamically-linked binaries","text":"

CrayPat allows you to profile un-instrumented, dynamically linked binaries with the pat_run utility. pat_run delivers profiling information for codes that cannot easily be rebuilt. To use pat_run:

  1. Load the perftools-base module if it is not already loaded.

    module load perftools-base

  2. Run your application normally including the pat_run command right after your srun options.

    srun [srun-options] pat_run [pat_run-options] program [program-options]

  3. Use pat_report to examine any data collected during the execution of your application.

    auser@ln01:/work/t01/t01/auser> pat_report jacobi+pat+12265-1573s

Some useful pat_run options are as follows.

"},{"location":"user-guide/profile/#further-help_1","title":"Further help","text":""},{"location":"user-guide/profile/#cray-apprentice2","title":"Cray Apprentice2","text":"

Cray Apprentice2 is an optional GUI tool that is used to visualize and manipulate the performance analysis data captured during program execution. Cray Apprentice2 can display a wide variety of reports and graphs, depending on the type of program being analyzed, the way in which the program was instrumented for data capture, and the data that was collected during program execution.

You will need to use CrayPat to first instrument your program and capture performance analysis data, and then pat_report to generate the .ap2 files from the results. You may then use Cray Apprentice2 to visualize and explore those files.

The number and appearance of the reports that can be generated using Cray Apprentice2 is determined by the kind and quantity of data captured during program execution, which in turn is determined by the way in which the program was instrumented and the environment variables in effect at the time of program execution. For example, changing the PAT_RT_SUMMARY environment variable to 0 before executing the instrumented program nearly doubles the number of reports available when analyzing the resulting data in Cray Apprentice2.

export PAT_RT_SUMMARY=0\n

To use Cray Apprentice2 (app2), load perftools-base module if it is not already loaded.

module load perftools-base\n

Next, open the experiment directory generated during the instrumentation phase with Apprentice2.

auser@ln01:/work/t01/t01/auser> app2 jacobi+pat+12265-1573s\n
"},{"location":"user-guide/profile/#hardware-performance-counters","title":"Hardware Performance Counters","text":"

Hardware performance counters can be used to monitor CPU and power events on ARCHER2 compute nodes. The monitoring and reporting of hardware counter events is integrated with CrayPat - users should use CrayPat as described earlier in this section to run profiling experiments to gather data from hardware counter events and to analyse the data.

"},{"location":"user-guide/profile/#counters-and-counter-groups-available","title":"Counters and counter groups available","text":"

You can explore which event counters are available on compute nodes by running the following commands (replace t01 with a valid budget code for your account):

module load perftools\nsrun --ntasks=1 --partition=standard --qos=short --account=t01 papi_avail\n

For convenience, the CrayPat tool provides predetermined groups of hardware event counters. You can get more information on the hardware event counters available through CrayPat with the following commands (on a login or compute node):

module load perftools\npat_help counters rome groups\n

If you want information on which hardware event counters are included in a group you can type the group name at the prompt you get after running the command above. Once you have finished browsing the help, type . to quit back to the command line.

"},{"location":"user-guide/profile/#powerenergy-counters-available","title":"Power/energy counters available","text":"

You can also access counters on power/energy consumption. To list the counters available to monitor power/energy use you can use the command (replace t01 with a valid budget code for your account):

module load perftools\nsrun --ntasks=1 --partition=standard --qos=short --account=t01 papi_native_avail -i cray_pm\n
"},{"location":"user-guide/profile/#enabling-hardware-counter-data-collection","title":"Enabling hardware counter data collection","text":"

You enable the collection of hardware event counter data as part of a CrayPat experiment by setting the environment variable PAT_RT_PERFCTR to a comma separated list of the groups/counters that you wish to measure.

For example, you could set (usually in your job submission script):

export PAT_RT_PERFCTR=1\n

to use the 1 counter group (summary with branch activity).

"},{"location":"user-guide/profile/#analysing-hardware-counter-data","title":"Analysing hardware counter data","text":"

If you enabled collection of hardware event counters when running your profiling experiment, you will automatically get a report on the data when you use the pat_report command to analyse the profile experiment data file.

You will see information similar to the following in the output from CrayPat for different sections of your code (this example if for the case where export PAT_RT_PERFCTR=1, counter group: summary with branch activity, was set in the job submission script):

==============================================================================\n  USER / main\n------------------------------------------------------------------------------\n  Time%                                                   88.3% \n  Time                                               446.113787 secs\n  Imb. Time                                           33.094417 secs\n  Imb. Time%                                               6.9% \n  Calls                       0.002 /sec                    1.0 calls\n  PAPI_BR_TKN                 0.240G/sec    106,855,535,005.863 branch\n  PAPI_TOT_INS                5.679G/sec  2,533,386,435,314.367 instr\n  PAPI_BR_INS                 0.509G/sec    227,125,246,394.008 branch\n  PAPI_TOT_CYC                            1,243,344,265,012.828 cycles\n  Instr per cycle                                          2.04 inst/cycle\n  MIPS                 1,453,770.20M/sec                        \n  Average Time per Call                              446.113787 secs\n  CrayPat Overhead : Time      0.2%           \n
"},{"location":"user-guide/profile/#using-the-craypat-api-to-gather-hardware-counter-data","title":"Using the CrayPAT API to gather hardware counter data","text":"

The CrayPAT API features a particular function, PAT_counters, that allows you to obtain the values of specific hardware counters at specific points within your code.

For convenience, we have developed an MPI-based wrapper for this aspect of the CrayPAT API, called pat_mpi_lib, which can be found via the link below.

https://github.com/cresta-eu/pat_mpi_lib

The PAT MPI Library makes it possible to monitor a user-defined set of hardware performance counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just four functions, and is intended to be straightforward to use. Once you've defined the hooks in your code for recording counter values, you can control which counters are read at runtime by setting the PAT_RT_PERFCTR environment variable in the job submission script. As your code executes, the defined set of counters will be read at various points. After each reading, the counter values are summed by rank 0 (via an MPI reduction) before being output to a log file.

Further information along with test harnesses and example scripts can be found by reading the PAT MPI Library readme file.

"},{"location":"user-guide/profile/#more-information-on-hardware-counters","title":"More information on hardware counters","text":"

More information on using hardware counters can be found in the appropriate section of the HPE documentation:

Also available are two MPI-based wrapper libraries, one for Power Management (PM) counters that cover such properties as point-in-time power, cumulative energy use and temperature; and one that provides access to PAPI counters. See the links below for further details.

"},{"location":"user-guide/profile/#performance-and-profiling-data-in-slurm","title":"Performance and profiling data in Slurm","text":"

Slurm commands on the login nodes can be used to quickly and simply retrieve information about memory usage for currently running and completed jobs.

There are three commands you can use on ARCHER2 to query job data from Slurm, two are standard Slurm commands and one is a script that provides information on running jobs:

We provide examples of the use of these three commands below.

For the sacct and sstat command, the memory properties we print out below are:

Tip

Slurm polls for the memory use in a job, this means that short-term changes in memory use may not be captured in the Slurm data.

"},{"location":"user-guide/profile/#example-1-sstat-for-running-jobs","title":"Example 1: sstat for running jobs","text":"

To display the current memory use of a running job with the ID 123456:

sstat --format=JobID,AveCPU,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot%150 -j 123456\n
"},{"location":"user-guide/profile/#example-2-sacct-for-finished-jobs","title":"Example 2: sacct for finished jobs","text":"

To display the memory use of a completed job with the ID 123456:

sacct --format=JobID,JobName,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot%150 -j 123456\n

Another usage of sacct is to display when a job was submitted, started running and ended for a particular user:

sacct --format=JobID,Submit,Start,End -u auser\n
"},{"location":"user-guide/profile/#example-3-archer2jobload-for-running-jobs","title":"Example 3: archer2jobload for running jobs","text":"

Using the archer2jobload command on its own with no options will show the current CPU and memory use across compute nodes for all running jobs.

More usefully, you can provide a job ID to archer2jobload and it will show a summary of the CPU and memory use for a specific job. For example, to get the usage data for job 123456, you would use:

auser@ln01:~> archer2jobload 123456\n# JOB: 123456\nCPU_LOAD            MEMORY              ALLOCMEM            FREE_MEM            TMP_DISK            NODELIST            \n127.35-127.86       256000              239872              169686-208172       0                   nid[001481,001638-00\n

This shows the minimum CPU load on a compute node is 126.04 (close to the limit of 128 cores) with the maximum load 127.41 (indicating all the nodes are being used evenly). The minimum free memory is 171893 MB and the maximum free memory is 177224 MB.

If you add the -l option, you will see a breakdown per node:

auser@ln01:~> archer2jobload -l 276236\n# JOB: 123456\nNODELIST            CPU_LOAD            MEMORY              ALLOCMEM            FREE_MEM            TMP_DISK            \nnid001481           127.86              256000              239872              169686              0                   \nnid001638           127.60              256000              239872              171060              0                   \nnid001639           127.64              256000              239872              171253              0                   \nnid001677           127.85              256000              239872              173820              0                   \nnid001678           127.75              256000              239872              173170              0                   \nnid001891           127.63              256000              239872              173316              0                   \nnid001921           127.65              256000              239872              207562              0                   \nnid001922           127.35              256000              239872              208172              0 \n
"},{"location":"user-guide/profile/#further-help-with-slurm","title":"Further help with Slurm","text":"

The definitions of any variables discussed here and more usage information can be found in the man pages of sstat and sacct.

"},{"location":"user-guide/profile/#amd-prof","title":"AMD \u03bcProf","text":"

The AMD \u03bcProf tool provides capabilities for low-level profiling on AMD processors, see:

"},{"location":"user-guide/profile/#linaro-forge","title":"Linaro Forge","text":"

The Linaro Forge tool also provides profiling capabilities. See:

"},{"location":"user-guide/profile/#darshan-io-profiling","title":"Darshan IO profiling","text":"

The Darshan lightweight IO profiling tool provides a quick way to profile the IO part of your software:

"},{"location":"user-guide/python/","title":"Using Python","text":"

Python is supported on ARCHER2 both for running intensive parallel jobs and also as an analysis tool. This section describes how to use Python in either of these scenarios.

The Python installations on ARCHER2 contain some of the most commonly used packages. If you wish to install additional Python packages, we recommend that you use the pip command, see the section entitled Installing your own Python packages (with pip).

Important

Python 2 is not supported on ARCHER2 as it has been deprecated since the start of 2020.

Note

When you log onto ARCHER2, no Python module is loaded by default. You will generally need to load the cray-python module to access the functionality described below.

"},{"location":"user-guide/python/#hpe-cray-python-distribution","title":"HPE Cray Python distribution","text":"

The recommended way to use Python on ARCHER2 is to use the HPE Cray Python distribution.

The HPE Cray distribution provides Python 3 along with some of the most common packages used for scientific computation and data analysis. These include:

The HPE Cray Python distribution can be loaded (either on the front-end or in a submission script) using:

module load cray-python\n

Tip

The HPE Cray Python distribution is built using GCC compilers. If you wish to compile your own Python, C/C++ or Fortran code to use with HPE Cray Python, you should ensure that you compile using PrgEnv-gnu to make sure they are compatible.

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-pip","title":"Installing your own Python packages (with pip)","text":"

Sometimes, you may need to setup a local custom Python environment such that it extends a centrally-installed cray-python module. By extend, we mean being able to install packages locally that are not provided by cray-python. This is necessary because some Python packages such as mpi4py must be built specifically for the ARCHER2 system and so are best provided centrally.

You can do this by creating a lightweight virtual environment where the local packages can be installed. This environment is created on top of an existing Python installation, known as the environment's base Python.

First, load the PrgEnv-gnu environment.

auser@ln01:~> module load PrgEnv-gnu\n

This first step is necessary because subsequent pip installs may involve source code compilation and it is better that this be done using the GCC compilers to maintain consistency with how some base Python packages have been built.

Second, select the base Python by loading the cray-python module that you wish to extend.

auser@ln01:~> module load cray-python\n

Next, create the virtual environment within a designated folder.

python -m venv --system-site-packages /work/t01/t01/auser/myvenv\n

In our example, the environment is created within a myvenv folder located on /work, which means the environment will be accessible from the compute nodes. The --system-site-packages option ensures this environment is based on the currently loaded cray-python module. See https://docs.python.org/3/library/venv.html for more details.

You're now ready to activate your environment.

source /work/t01/t01/auser/myvenv/bin/activate\n

Tip

The myvenv path uses a fictitious project code, t01, and username, auser. Please remember to replace those values with your actual project code and username. Alternatively, you could enter ${HOME/home/work} in place of /work/t01/t01/auser. That command fragment expands ${HOME} and then replaces the home part with work.

Installing packages to your local environment can now be done as follows.

(myvenv) auser@ln01:~> python -m pip install <package name>\n

Running pip directly as in pip install <package name> will also work, but we show the python -m approach as this is consistent with the way the virtual environment was created. Further, if the package installation will require code compilation, you should amend the command to ensure use of the ARCHER2 compiler wrappers.

(myvenv) auser@ln01:~> CC=cc CXX=CC FC=ftn python -m pip install <package name>\n

And when you have finished installing packages, you can deactivate the environment by running the deactivate command.

(myvenv) auser@ln01:~> deactivate\nauser@ln01:~>\n

The packages you have installed will only be available once the local environment has been activated. So, when running code that requires these packages, you must first activate the environment, by adding the activation command to the submission script, as shown below.

#!/bin/bash --login\n\n#SBATCH --job-name=myvenv\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=64\n#SBATCH --cpus-per-task=2\n#SBATCH --time=00:10:00\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nsource /work/t01/t01/auser/myvenv/bin/activate\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\nsrun --distribution=block:block --hint=nomultithread python myvenv-script.py\n

Tip

If you find that a module you've installed to a virtual environment on /work isn't found when running a job, it may be that it was previously installed to the default location of $HOME/.local which is not mounted on the compute nodes. This can be an issue as pip will reuse any modules found at this default location rather than reinstall them into a virtual environment. Thus, even if the virtual environment is on /work, a module you've asked for may actually be located on /home.

You can check a module's install location and its dependencies with pip show, for example pip show matplotlib. You may then run pip uninstall matplotlib while no virtual environment is active to uninstall it from $HOME/.local, and then re-run pip install matplotlib while your virtual environment on /work is active to reinstall it there. You will need to do this for any modules installed on /home that will use either directly or indirectly. Remember you can check all your installed modules with pip list.

"},{"location":"user-guide/python/#extending-ml-modules-with-your-own-packages-via-pip","title":"Extending ML modules with your own packages via pip","text":"

The environment being extended does not have to come from one of the centrally-installed cray-python modules. You can also create a local virtual environment based on one of the Machine Learning (ML) modules, e.g., tensorflow or pytorch. One extra command is required; it is issued immediately after the python -m venv ... command.

extend-venv-activate /work/t01/t01/auser/myvenv\n

The extend-venv-activate command merely adds some extra commands to the virtual environment's activate script, ensuring that the Python packages will be gathered from the local virtual environment, the ML module and from the cray-python base module. All this means you would avoid having to install ML packages within your local area.

Note

The extend-venv-activate command becomes available (i.e., its location is placed on the path) only when the ML module is loaded. The ML modules are themselves based on cray-python. For example, tensorflow/2.12.0 is based on the cray-python/3.9.13.1 module.

"},{"location":"user-guide/python/#conda-on-archer2","title":"Conda on ARCHER2","text":"

Conda-based Python distributions (e.g. Anaconda, Mamba, Miniconda) are an extremely popular way of installing and accessing software on many systems, including ARCHER2. Although conda-based distributions can be used on ARCHER2, care is needed in how they are installed and configured so that the installation does not adversely effect your use of ARCHER2. In particular, you should be careful of:

We cover each of these points in more detail below.

"},{"location":"user-guide/python/#conda-install-location","title":"Conda install location","text":"

If you only need to use the files and executables from your conda installation on the login and data analysis nodes (via the serial QoS) then the best place to install conda is in your home directory structure - this will usually be the default install location provided by the installation script.

If you need to access the files and executables from conda on the compute nodes then you will need to install to a different location as the home file systems are not available on the compute nodes. The work file systems are not well suited to hosting Python software natively due to the way in which file access work, particularly during Python startup. There are two main options for using conda from ARCHER2 compute nodes:

  1. Use a conda container image
  2. Install conda on the solid state storage
"},{"location":"user-guide/python/#use-a-conda-container-image","title":"Use a conda container image","text":"

You can pull official conda-based container images from Dockerhub that you can use if you want just the standard set of Python modules that come with the distribution. For example, to get the latest Anaconda distribution as a Singularity container image on the ARCHER2 work file system, you would use (on an ARCHER2 login node, from the directory on the work file system where you want to store the container image):

singularity build anaconda3.sif docker://continuumio/anaconda3\n

Once you have the container image, you can run scripts in it with a command like:

singularity exec -B $PWD anaconda3.sif python my_script.py\n

As the container image is a single large file, you end up doing a single large read from the work file system rather than lots of small reads of individual Python files, this improves the performance of Python and reduces the detrimental impact on the wider file system performance for all users.

We have pre-built a Singularity container with the Anaconda distribution in on ARCHER2. Users can access it at $EPCC_SINGULARITY_DIR/anaconda3.sif. To run a Python script with the centrally-installed image, you can use:

singularity exec -B $PWD $EPCC_SINGULARITY_DIR/anaconda3.sif python my_script.py\n

If you want additional packages that are not available in the standard container images then you will need to build your own container images. If you need help to do this, then please contact the ARCHER2 Service Desk

"},{"location":"user-guide/python/#conda-addtions-to-shell-configuration-files","title":"Conda addtions to shell configuration files","text":"

During the install process most conda-based distributions will ask a question like:

Do you wish the installer to initialize Miniconda3 by running conda init?

If you are installing to the ARCHER2 work directories or the solid state storage, you should answer \"no\" to this question.

Adding the initialisation to shell startup scripts (typically .bashrc) means that every time you login to ARCHER2, the conda environment will try to initialise by reading lots of files within the conda installation. This approach was designed for the case where a user has installed conda on their personal device and so is the only user of the file system. For shared file systems such as those on ARCHER2, this places a large load on the file system and will lead to you seeing slow login times and slow response from your command line on ARCHER2. It will also lead to degraded read/write performance from the work file systems for you and other users so should be avoided at all costs.

If you have previously installed a conda distribution and answered \"yes\" to the question about adding the initialisation to shell configuration files, you should edit your ~/.bashrc file to remove the conda initialisation entries. This means deleting the lines that look something like:

# >>> conda initialize >>>\n# !! Contents within this block are managed by 'conda init' !!\n__conda_setup=\"$('/work/t01/t01/auser/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)\"\nif [ $? -eq 0 ]; then\neval \"$__conda_setup\"\nelse\nif [ -f \"/work/t01/t01/auser/miniconda3/etc/profile.d/conda.sh\" ]; then\n. \"/work/t01/t01/auser/miniconda3/etc/profile.d/conda.sh\"\nelse\nexport PATH=\"/work/t01/t01/auser/miniconda3/bin:$PATH\"\nfi\nfi\nunset __conda_setup\n# <<< conda initialize <<<\n
"},{"location":"user-guide/python/#running-python","title":"Running Python","text":""},{"location":"user-guide/python/#example-serial-python-submission-script","title":"Example serial Python submission script","text":"
#!/bin/bash --login\n\n#SBATCH --job-name=python_test\n#SBATCH --ntasks=1\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Load the Python module, ...\nmodule load cray-python\n\n# ..., or, if using local virtual environment\nsource <<path to virtual environment>>/bin/activate\n\n# Run your Python program\npython python_test.py\n
"},{"location":"user-guide/python/#example-mpi4py-job-submission-script","title":"Example mpi4py job submission script","text":"

Programs that have been parallelised with mpi4py can be run on the ARCHER2 compute nodes. Unlike the serial Python submission script however, we must launch the Python interpreter using srun. Failing to do so will result in Python running a single MPI rank only.

#!/bin/bash --login\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=mpi4py_test\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:10:0\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the Python module, ...\nmodule load cray-python\n\n# ..., or, if using local virtual environment\nsource <<path to virtual environment>>/bin/activate\n\n# Pass cpus-per-task setting to srun\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\n# Run your Python program\n#   Note that srun MUST be used to wrap the call to python,\n#   otherwise your code will run serially\nsrun --distribution=block:block --hint=nomultithread python mpi4py_test.py\n

Tip

If you have installed your own packages you will need to activate your local Python environment within your job submission script as shown at the end of Installing your own Python packages (with pip).

By default, mpi4py will use the Cray MPICH OFI library. If one wishes to use UCX instead, you must first, within the submission script, load PrgEnv-gnu before loading the UCX modules, as shown below.

module load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\nmodule load cray-python\n
"},{"location":"user-guide/python/#running-python-at-scale","title":"Running Python at scale","text":"

The file system metadata server may become overloaded when running a parallel Python script over many fully populated nodes (i.e., 128 MPI ranks per node). Performance degrades due to the IO operations that accompany a high volume of Python import statements. Typically, each import will first require the module or library to be located by searching a number of file paths before the module is loaded into memory. Such a workload scales as Np x Nlib x Npath , where Np is the number of parallel processes, Nlib is the number of libraries imported and Npath the number of file paths searched. And so, in this way much time can be lost during the initial phase of a large Python job, not to mention the fact that the IO contention will be impacting other users of the system.

Spindle is a tool for improving the library-loading performance of dynamically linked HPC applications. It provides a mechanism for\u00a0scalable loading of shared libraries, executables and Python\u00a0files from a shared file system at scale without turning the file system into a bottleneck. This is achieved by caching libraries or their locations within node memory. Spindle takes a\u00a0pure user-space\u00a0approach: users do not need to configure new file systems, load particular OS kernels or build special system components. The tool operates on existing binaries \u2014\u00a0no application modification or special build flags\u00a0are required.

The script below shows how to run Spindle with your Python code.

#!/bin/bash --login\n\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=128\n...\n\nmodule load cray-python\nmodule load spindle/0.13\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\nspindle --slurm --python-prefix=/opt/cray/pe/python/${CRAY_PYTHON_LEVEL} \\      \n    srun --overlap --distribution=block:block --hint=nomultithread \\\n        python mpi4py_script.py\n

The --python-prefix argument can be set to a list of colon-separated paths if necessary. In the example above, the CRAY_PYTHON_LEVEL environment variable is set as a conseqeunce of loading cray-python.

Note

The srun --overlap option is required for Spindle as the version of Slurm on ARCHER2 is newer than 20.11.

"},{"location":"user-guide/python/#using-jupyterlab-on-archer2","title":"Using JupyterLab on ARCHER2","text":"

It is possible to view and run Jupyter notebooks from both login nodes and compute nodes on ARCHER2.

Note

You can test such notebooks on the login nodes, but please do not attempt to run any computationally intensive work. Jobs may get killed once they hit a CPU limit on login nodes.

Please follow these steps.

  1. Install JupyterLab in your work directory.

    module load cray-python\nexport PYTHONUSERBASE=/work/t01/t01/auser/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\n# source <<path to virtual environment>>/bin/activate  # If using a virtualenvironment uncomment this line and remove the --user flag from the next\n\npip install --user jupyterlab\n

  2. If you want to test JupyterLab on the login node please go straight to step 3. To run your Jupyter notebook on a compute node, you first need to run an interactive session.

    srun --nodes=1 --exclusive --time=00:20:00 --account=<your_budget> \\\n     --partition=standard --qos=short --reservation=shortqos \\\n     --pty /bin/bash\n
    Your prompt will change to something like below.
    auser@nid001015:/tmp>\n
    In this case, the node id is nid001015. Now execute the following on the compute node.
    cd /work/t01/t01/auser # Update the path to your work directory\nexport PYTHONUSERBASE=$(pwd)/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\nexport HOME=$(pwd)\nmodule load cray-python\n# source <<path to virtual environment>>/bin/activate  # If using a virtualenvironment uncomment this line\n

  3. Run the JupyterLab server.

    export JUPYTER_RUNTIME_DIR=$(pwd)\njupyter lab --ip=0.0.0.0 --no-browser\n
    Once it's started, you will see a URL printed in the terminal window of the form http://127.0.0.1:<port_number>/lab?token=<string>; we'll need this URL for step 6.

  4. Please skip this step if you are connecting from a machine running Windows. Open a new terminal window on your laptop and run the following command.

    ssh <username>@login.archer2.ac.uk -L<port_number>:<node_id>:<port_number>\n
    where <username> is your username, and <node_id> is the id of the node you're currently on (for a login node, this will be ln01, or similar; on a compute node, it will be a mix of numbers and letters). In our example, <node_id> is nid001015. Note, please use the same port number as that shown in the URL of step 3. This number may vary, likely values are 8888 or 8889.

  5. Please skip this step if you are connecting from Linux or macOS. If you are connecting from Windows, you should use MobaXterm to configure an SSH tunnel as follows.

  6. Now, if you open a browser window locally, you should be able to navigate to the URL from step 3, and this should display the JupyterLab server. If JupyterLab is running on a compute node, the notebook will be available for the length of the interactive session you have requested.

Warning

Please do not use the other http address given by the JupyterLab output, the one formatted http://<node_id>:<port_number>/lab?token=<string>. Your local browser will not recognise the <node_id> part of the address.

"},{"location":"user-guide/python/#using-dask-job-queue-on-archer2","title":"Using Dask Job-Queue on ARCHER2","text":"

The Dask-jobqueue project makes it easy to deploy Dask on ARCHER2. You can find more information in the Dask Job-Queue documentation.

Please follow these steps:

  1. Install Dask-Jobqueue
module load cray-python\nexport PYTHONUSERBASE=/work/t01/t01/auser/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\n\npip install --user dask-jobqueue --upgrade\n
  1. Using Dask

Dask-jobqueue creates a Dask Scheduler in the Python process where the cluster object is instantiated. A script for running dask jobs on ARCHER2 might look something like this:

from dask_jobqueue import SLURMCluster\ncluster = SLURMCluster(cores=128, \n                       processes=16,\n                       memory='256GB',\n                       queue='standard',\n                       header_skip=['--mem'],\n                       job_extra=['--qos=\"standard\"'],\n                       python='srun python',\n                       project='z19',\n                       walltime=\"01:00:00\",\n                       shebang=\"#!/bin/bash --login\",\n                       local_directory='$PWD',\n                       interface='hsn0',\n                       env_extra=['module load cray-python',\n                                  'export PYTHONUSERBASE=/work/t01/t01/auser/.local/',\n                                  'export PATH=$PYTHONUSERBASE/bin:$PATH',\n                                  'export PYTHONPATH=$PYTHONUSERBASE/lib/python3.8/site-packages:$PYTHONPATH'])\n\n\n\ncluster.scale(jobs=2)    # Deploy two single-node jobs\n\nfrom dask.distributed import Client\nclient = Client(cluster)  # Connect this local process to remote workers\n\n# wait for jobs to arrive, depending on the queue, this may take some time\nimport dask.array as da\nx = \u2026              # Dask commands now use these distributed resources\n

This script can be run on the login nodes and it submits the Dask jobs to the job queue. Users should ensure that the computationally intensive work is done with the Dask commands which run on the compute nodes.

The cluster object parameters specify the characteristics for running on a single compute node. The header_skip option is required as we are running on exclusive nodes where you should not specify the memory requirements, however Dask requires you to supply this option.

Jobs are be deployed with the cluster.scale command, where the jobs option sets the number of single node jobs requested. Job scripts are generated (from the cluster object) and these are submitted to the queue to begin running once the resources are available. You can check the status of the jobs by running squeue -u $USER in a separate terminal.

If you wish to see the generated job script you can use:

print(cluster.job_script())\n
"},{"location":"user-guide/scheduler/","title":"Running jobs on ARCHER2","text":"

As with most HPC services, ARCHER2 uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. ARCHER2 uses the Slurm software to schedule jobs.

Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.

Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.

Hint

If you have any questions on how to run jobs on ARCHER2 do not hesitate to contact the ARCHER2 Service Desk.

You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.

"},{"location":"user-guide/scheduler/#resources","title":"Resources","text":""},{"location":"user-guide/scheduler/#cus","title":"CUs","text":"

Time used on ARCHER2 is measured in CUs. 1 CU = 1 Node Hour for a standard 128 core node.

The CU calculator will help you to calculate the CU cost for your jobs.

"},{"location":"user-guide/scheduler/#checking-available-budget","title":"Checking available budget","text":"

You can check in SAFE by selecting Login accounts from the menu, select the login account you want to query.

Under Login account details you will see each of the budget codes you have access to listed e.g. e123 resources and then under Resource Pool to the right of this, a note of the remaining budget in CUs.

When logged in to the machine you can also use the command

sacctmgr show assoc where user=$LOGNAME format=account,user,maxtresmins\n

This will list all the budget codes that you have access to e.g.

   Account       User   MaxTRESMins\n---------- ---------- -------------\n      e123      userx         cpu=0\n e123-test      userx\n

This shows that userx is a member of budgets e123 and e123-test. However, the cpu=0 indicates that the e123 budget is empty or disabled. This user can submit jobs using the e123-test budget.

To see the number of CUs remaining you must check in SAFE.

"},{"location":"user-guide/scheduler/#charging","title":"Charging","text":"

Jobs run on ARCHER2 are charged for the time they use i.e. from the time the job begins to run until the time the job ends (not the full wall time requested).

Jobs are charged for the full number of nodes which are requested, even if they are not all used.

Charging takes place at the time the job ends, and the job is charged in full to the budget which is live at the end time.

"},{"location":"user-guide/scheduler/#basic-slurm-commands","title":"Basic Slurm commands","text":"

There are four key commands used to interact with the Slurm on the command line:

We cover each of these commands in more detail below.

"},{"location":"user-guide/scheduler/#sinfo-information-on-resources","title":"sinfo: information on resources","text":"

sinfo is used to query information about available resources and partitions. Without any options, sinfo lists the status of all resources and partitions, e.g.

auser@ln01:~> sinfo\n\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\nstandard     up 1-00:00:00    105  down* nid[001006,...,002014]\nstandard     up 1-00:00:00     12  drain nid[001016,...,001969]\nstandard     up 1-00:00:00      5   resv nid[001000,001002-001004,001114]\nstandard     up 1-00:00:00    683  alloc nid[001001,...,001970-001991]\nstandard     up 1-00:00:00    214   idle nid[001022-001023,...,002015-002023]\nstandard     up 1-00:00:00      2   down nid[001021,001050]\n

Here we see the number of nodes in different states. For example, 683 nodes are allocated (running jobs), and 214 are idle (available to run jobs).

Note

that long lists of node IDs have been abbreviated with ....

"},{"location":"user-guide/scheduler/#sbatch-submitting-jobs","title":"sbatch: submitting jobs","text":"

sbatch is used to submit a job script to the job submission system. The script will typically contain one or more srun commands to launch parallel tasks.

When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.

auser@ln01:~> sbatch test-job.slurm\nSubmitted batch job 12345\n
"},{"location":"user-guide/scheduler/#squeue-monitoring-jobs","title":"squeue: monitoring jobs","text":"

squeue without any options or arguments shows the current status of all jobs known to the scheduler. For example:

auser@ln01:~> squeue\n

will list all jobs on ARCHER2.

The output of this is often overwhelmingly large. You can restrict the output to just your jobs by adding the -u $USER option:

auser@ln01:~> squeue -u $USER\n
"},{"location":"user-guide/scheduler/#scancel-deleting-jobs","title":"scancel: deleting jobs","text":"

scancel is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately.

If you only want to cancel a specific job you need to provide the job ID of the job you wish to cancel/stop. For example:

auser@ln01:~> scancel 12345\n

will cancel (if waiting) or stop (if running) the job with ID 12345.

scancel can take other options. For example, if you want to cancel all your pending (queued) jobs but leave the running jobs running, you could use:

auser@ln01:~> scancel --state=PENDING --user=$USER\n
"},{"location":"user-guide/scheduler/#resource-limits","title":"Resource Limits","text":"

The ARCHER2 resource limits for any given job are covered by three separate attributes.

"},{"location":"user-guide/scheduler/#primary-resource","title":"Primary resource","text":"

The primary resource you can request for your job is the compute node.

Information

The --exclusive option is enforced on ARCHER2 which means you will always have access to all of the memory on the compute node regardless of how many processes are actually running on the node.

Note

You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.

"},{"location":"user-guide/scheduler/#partitions","title":"Partitions","text":"

On ARCHER2, compute nodes are grouped into partitions. You will have to specify a partition using the --partition option in your Slurm submission script. The following table has a list of active partitions on ARCHER2.

Full system Partition Description Max nodes available standard CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 256/512 GB memory 5860 highmem CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 512 GB memory 584 serial CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 512 GB memory 2 gpu GPU nodes with AMD EPYC 32-core processor, 512 GB memory, 4\u00d7AMD Instinct MI210 GPU 4

Note

The standard partition includes both the standard memory and high memory nodes but standard memory nodes are preferentially chosen for jobs where possible. To guarantee access to high memory nodes you should specify the highmem partition.

"},{"location":"user-guide/scheduler/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

On ARCHER2, job limits are defined by the requested Quality of Service (QoS), as specified by the --qos Slurm directive. The following table lists the active QoS on ARCHER2.

Full system QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 1024 24 hrs 64 16 standard Maximum of 1024 nodes in use by any one user at any time highmem 256 24 hrs 16 16 highmem Maximum of 512 nodes in use by any one user at any time taskfarm 16 24 hrs 128 32 standard Maximum of 256 nodes in use by any one user at any time short 32 20 mins 16 4 standard long 64 96 hrs 16 16 standard Minimum walltime of 24 hrs, maximum 512 nodes in use by any one user at any time, maximum of 2048 nodes in use by QoS largescale 5860 12 hrs 8 1 standard Minimum job size of 1025 nodes lowpriority 2048 24 hrs 16 16 standard Jobs not charged but requires at least 1 CU in budget to use. serial 32 cores and/or 128 GB memory 24 hrs 32 4 serial Jobs not charged but requires at least 1 CU in budget to use. Maximum of 32 cores and/or 128 GB in use by any one user at any time. reservation Size of reservation Length of reservation No limit no limit standard capabilityday At least 4096 nodes 3 hrs 8 2 standard Minimum job size of 512 nodes. Jobs only run during Capability Days gpu-shd 1 12 hrs 2 1 gpu GPU nodes potentially shared with other users gpu-exc 2 12 hrs 2 1 gpu GPU node exclusive node access

You can find out the QoS that you can use by running the following command:

Full system
auser@ln01:~> sacctmgr show assoc user=$USER cluster=archer2 format=cluster,account,user,qos%50\n

Hint

If you have needs which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.

"},{"location":"user-guide/scheduler/#e-mail-notifications","title":"E-mail notifications","text":"

E-mail notifications from the scheduler are not currently available on ARCHER2.

"},{"location":"user-guide/scheduler/#priority","title":"Priority","text":"

Job priority on ARCHER2 depends on a number of different factors:

Each of these factors is normalised to a value between 0 and 1, is multiplied with a weight and the resulting values combined to produce a priority for the job. The current job priority formula on ARCHER2 is:

Priority = [10000 * P(QoS)] + [500 * P(Age)] + [300 * P(Fairshare)] + [100 * P(size)]\n

The priority factors are:

You can view the priorities for current queued jobs on the system with the sprio command:

auser@ln04:~> sprio -l\n          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE        QOS\n         828764 standard        1049          0         45          0          4       1000\n         828765 standard        1049          0         45          0          4       1000\n         828770 standard        1049          0         45          0          4       1000\n         828771 standard        1012          0          8          0          4       1000\n         828773 standard        1012          0          8          0          4       1000\n         828791 standard        1012          0          8          0          4       1000\n         828797 standard        1118          0        115          0          4       1000\n         828800 standard        1154          0        150          0          4       1000\n         828801 standard        1154          0        150          0          4       1000\n         828805 standard        1118          0        115          0          4       1000\n         828806 standard        1154          0        150          0          4       1000\n
"},{"location":"user-guide/scheduler/#troubleshooting","title":"Troubleshooting","text":""},{"location":"user-guide/scheduler/#slurm-error-messages","title":"Slurm error messages","text":"

An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:

"},{"location":"user-guide/scheduler/#slurm-job-state-codes","title":"Slurm job state codes","text":"

The squeue command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.

Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.

For a full list of see Job State Codes.

"},{"location":"user-guide/scheduler/#slurm-queued-reasons","title":"Slurm queued reasons","text":"Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available.

For a full list of see Job Reasons.

"},{"location":"user-guide/scheduler/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"

Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file slurm_<JobID>.out. This file appears in the job's working directory once your job starts running.

Hint

Output may be buffered - to enable live output, e.g. for monitoring job status, add --unbuffered to the srun command in your Slurm script.

"},{"location":"user-guide/scheduler/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"

You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive #SBATCH.

Hint

Most options provided using #SBATCH directives can also be specified as command line options to srun.

If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too with the option:

Important

You must specify an account code for your job otherwise it will fail to submit with the error: sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified. (This error can also mean that you have specified a budget that has run out of resources.)

Other common options that are used are:

To prevent the behaviour of batch scripts being dependent on the user environment at the point of submission, the option

Using the --export=none means that the behaviour of batch submissions should be repeatable. We strongly recommend its use.

Note

When submitting your job, the scheduler will check that the requested resources are available e.g. that your account is a member of the requested budget, that the requested QoS exists. If things change before the job starts and e.g. your account has been removed from the requested budget or the requested QoS has been deleted then the job will not be able to start. In such cases, the job will be removed from the pending queue by our systems team, as it will no longer be eligible to run.

"},{"location":"user-guide/scheduler/#additional-options-for-parallel-jobs","title":"Additional options for parallel jobs","text":"

Note

For parallel jobs, ARCHER2 operates in a node exclusive way. This means that you are assigned resources in the units of full compute nodes for your jobs (i.e. 128 cores) and that no other user can share those compute nodes with you. Hence, the minimum amount of resource you can request for a parallel job is 1 node (or 128 cores).

In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.

For parallel jobs that use threading (e.g. OpenMP) or when you want to use less than 128 cores per node (e.g. to access more memory or memory bandwidth per core), you will also need to change the --cpus-per-task option.

For jobs using threading: - --cpus-per-task=<threads per task> the number of threads per parallel process (e.g. number of OpenMP threads per MPI task for hybrid MPI/OpenMP jobs). Important: you must also set the OMP_NUM_THREADS environment variable if using OpenMP in your job.

For jobs using less than 128 cores per node: - --cpus-per-task=<stride between placement of processes> the stride between the parallel processes. For example, if you want to double the memory and memory bandwidth per process on an ARCHER2 compute node you would want to place 64 processes per node and leave an empty core between each process you would set --cpus-per-task=2 and --ntasks-per-node=64.

Important

You must also add export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK to your job submission script to pass the --cpus-per-task setting from the job script to the srun command. (Alternatively, you could use the --cpus-per-task option in the srun command itself.) If you do not do this then the placement of processes/threads will be incorrect and you will likely see poor performance of your application.

"},{"location":"user-guide/scheduler/#options-for-jobs-on-the-data-analysis-nodes","title":"Options for jobs on the data analysis nodes","text":"

The data analysis nodes are shared between all users and can be used to run jobs that require small numbers of cores and/or access to an external network to transfer data. These jobs are often serial jobs that only require a single core.

To run jobs on the data analysis node you require the following options:

More information on using the data analysis nodes (including example job submission scripts) can be found in the Data Analysis section of the User and Best Practice Guide.

"},{"location":"user-guide/scheduler/#srun-launching-parallel-jobs","title":"srun: Launching parallel jobs","text":"

If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. In most cases you will want to add the options --distribution=block:block and --hint=nomultithread to your srun command to ensure you get the correct pinning of processes to cores on a compute node.

Warning

If you do not add the --distribution=block:block and --hint=nomultithread options to your srun command the default process placement may lead to a drop in performance for your jobs on ARCHER2.

A brief explanation of these options: - --hint=nomultithread - do not use hyperthreads/SMP - --distribution=block:block - the first block means use a block distribution of processes across nodes (i.e. fill nodes before moving onto the next one) and the second block means use a block distribution of processes across \"sockets\" within a node (i.e. fill a \"socket\" before moving on to the next one).

Important

The Slurm definition of a \"socket\" does not correspond to a physical CPU socket. On ARCHER2 it corresponds to a 4-core CCX (Core CompleX).

"},{"location":"user-guide/scheduler/#slurm-definition-of-a-socket","title":"Slurm definition of a \"socket\"","text":"

On ARCHER2, Slurm is configured with the following setting:

SlurmdParameters=l3cache_as_socket\n

The effect of this setting is to define a Slurm socket as a unit that has a shared L3 cache. On ARCHER2, this means that each Slurm \"socket\" corresponds to a 4-core CCX (Core CompleX). For a more detailed discussion on the hardware and the memory/cache layout see the Hardware section.

The effect of this setting can be illustrated by using the xthi program to report placement when we select a cyclic distribution of processes across sockets from srun (--distribution=block:cyclic). As you can see from the output from xthi included below, the cyclic per-socket distribution results in sequential MPI processes being placed on every 4th core (i.e. cyclic placement across CCX).

Node summary for    1 nodes:\nNode    0, hostname nid000006, mpi 128, omp   1, executable xthi_mpi\nMPI summary: 128 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    4)\nNode    0, rank    2, thread   0, (affinity =    8)\nNode    0, rank    3, thread   0, (affinity =   12)\nNode    0, rank    4, thread   0, (affinity =   16)\nNode    0, rank    5, thread   0, (affinity =   20)\nNode    0, rank    6, thread   0, (affinity =   24)\nNode    0, rank    7, thread   0, (affinity =   28)\nNode    0, rank    8, thread   0, (affinity =   32)\nNode    0, rank    9, thread   0, (affinity =   36)\nNode    0, rank   10, thread   0, (affinity =   40)\nNode    0, rank   11, thread   0, (affinity =   44)\nNode    0, rank   12, thread   0, (affinity =   48)\nNode    0, rank   13, thread   0, (affinity =   52)\nNode    0, rank   14, thread   0, (affinity =   56)\nNode    0, rank   15, thread   0, (affinity =   60)\nNode    0, rank   16, thread   0, (affinity =   64)\nNode    0, rank   17, thread   0, (affinity =   68)\nNode    0, rank   18, thread   0, (affinity =   72)\nNode    0, rank   19, thread   0, (affinity =   76)\nNode    0, rank   20, thread   0, (affinity =   80)\nNode    0, rank   21, thread   0, (affinity =   84)\nNode    0, rank   22, thread   0, (affinity =   88)\nNode    0, rank   23, thread   0, (affinity =   92)\nNode    0, rank   24, thread   0, (affinity =   96)\nNode    0, rank   25, thread   0, (affinity =  100)\nNode    0, rank   26, thread   0, (affinity =  104)\nNode    0, rank   27, thread   0, (affinity =  108)\nNode    0, rank   28, thread   0, (affinity =  112)\nNode    0, rank   29, thread   0, (affinity =  116)\nNode    0, rank   30, thread   0, (affinity =  120)\nNode    0, rank   31, thread   0, (affinity =  124)\nNode    0, rank   32, thread   0, (affinity =    1)\nNode    0, rank   33, thread   0, (affinity =    5)\nNode    0, rank   34, thread   0, (affinity =    9)\nNode    0, rank   35, thread   0, (affinity =   13)\nNode    0, rank   36, thread   0, (affinity =   17)\nNode    0, rank   37, thread   0, (affinity =   21)\nNode    0, rank   38, thread   0, (affinity =   25)\n\n...output trimmed...\n
"},{"location":"user-guide/scheduler/#bolt-job-submission-script-creation-tool","title":"bolt: Job submission script creation tool","text":"

The bolt job submission script creation tool has been written by EPCC to simplify the process of writing job submission scripts for modern multicore architectures. Based on the options you supply, bolt will generate a job submission script that uses ARCHER2 in a reasonable way.

MPI, OpenMP and hybrid MPI/OpenMP jobs are supported.

Warning

The tool will allow you to generate scripts for jobs that use the long QoS but you will need to manually modify the resulting script to change the QoS to long.

If there are problems or errors in your job parameter specifications then bolt will print warnings or errors. However, bolt cannot detect all problems.

"},{"location":"user-guide/scheduler/#basic-usage","title":"Basic Usage","text":"

The basic syntax for using bolt is:

bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] \\\n     -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code]  [arguments...]\n

Example 1: to generate a job script to run an executable called my_prog.x for 24 hours using 8192 parallel (MPI) processes and 128 (MPI) processes per compute node you would use something like:

bolt -n 8192 -N 128 -t 24:0:0 -o my_job.bolt -j my_job -A z01-budget my_prog.x arg1 arg2\n

(remember to substitute z01-budget for your actual budget code.)

Example 2: to generate a job script to run an executable called my_prog.x for 3 hours using 2048 parallel (MPI) processes and 64 (MPI) processes per compute node (i.e. using half of the cores on a compute node), you would use:

bolt -n 2048 -N 64 -t 3:0:0 -o my_job.bolt -j my_job -A z01-budget my_prog.x arg1 arg2\n

These examples generate the job script my_job.bolt with the correct options to run my_prog.x with command line arguments arg1 and arg2. The project code against which the job will be charged is specified with the ' -A ' option. As usual, the job script is submitted as follows:

sbatch my_job.bolt\n

Hint

If you do not specify the script name with the '-o' option then your script will be a file called a.bolt.

Hint

If you do not specify the number of parallel tasks then bolt will try to generate a serial job submission script (and throw an error on the ARCHER2 4 cabinet system as serial jobs are not supported).

Hint

If you do not specify a project code, bolt will use your default project code (set by your login account).

Hint

If you do not specify a job name, bolt will use either bolt_ser_job (for serial jobs) or bolt_par_job (for parallel jobs).

"},{"location":"user-guide/scheduler/#further-help","title":"Further help","text":"

You can access further help on using bolt on ARCHER2 with the ' -h ' option:

bolt -h\n

A selection of other useful options are:

"},{"location":"user-guide/scheduler/#checkscript-job-submission-script-validation-tool","title":"checkScript job submission script validation tool","text":"

The checkScript tool has been written to allow users to validate their job submission scripts before submitting their jobs. The tool will read your job submission script and try to identify errors, problems or inconsistencies.

An example of the sort of output the tool can give would be:

auser@ln01:/work/t01/t01/auser> checkScript submit.slurm\n\n===========================================================================\ncheckScript\n---------------------------------------------------------------------------\nCopyright 2011-2020  EPCC, The University of Edinburgh\nThis program comes with ABSOLUTELY NO WARRANTY.\nThis is free software, and you are welcome to redistribute it\nunder certain conditions.\n===========================================================================\n\nScript details\n---------------\n       User: auser\nScript file: submit.slurm\n  Directory: /work/t01/t01/auser (ok)\n   Job name: test (ok)\n  Partition: standard (ok)\n        QoS: standard (ok)\nCombination:          (ok)\n\nRequested resources\n-------------------\n         nodes =              3                     (ok)\ntasks per node =             16\n cpus per task =              8\ncores per node =            128                     (ok)\nOpenMP defined =           True                     (ok)\n      walltime =          1:0:0                     (ok)\n\nCU Usage Estimate (if full job time used)\n------------------------------------------\n                      CU =          3.000\n\n\n\ncheckScript finished: 0 warning(s) and 0 error(s).\n
"},{"location":"user-guide/scheduler/#checking-scripts-and-estimating-start-time-with-test-only","title":"Checking scripts and estimating start time with --test-only","text":"

sbatch --test-only validates the batch script and returns an estimate of when the job would be scheduled to run given the current scheduler state. Please note that it is just an estimate, the actual start time may differ as the scheduler status when the start time was estimated may be different once the job is actually submitted and due to subsequent changes to the scheduler state. The job is not actually submitted.

auser@ln01:~> sbatch --test-only submit.slurm\nsbatch: Job 1039497 to start at 2022-02-01T23:20:51 using 256 processors on nodes nid002836\nin partition standard\n
"},{"location":"user-guide/scheduler/#estimated-start-time-for-queued-jobs","title":"Estimated start time for queued jobs","text":"

You can use the squeue command to show the current estimated start time for a job. Please note that it is just an estimate, the actual start time may differ as the scheduler status when the start time was estimated may be different due to subsequent changes to the scheduler state. To return the estimated start time for a job you spacify the job ID with the --jobs=<jobid> and --Format=StartTime options.

For example, to show the estimated start time for job 123456, you would use:

squeue --jobs=123456 --Format=StartTime\n

The output from this command would look like:

START_TIME          \n2024-09-25T13:07:00\n
"},{"location":"user-guide/scheduler/#example-job-submission-scripts","title":"Example job submission scripts","text":"

A subset of example job submission scripts are included in full below. Examples are provided for both the full system and the 4-cabinet system.

"},{"location":"user-guide/scheduler/#example-job-submission-script-for-mpi-parallel-job","title":"Example: job submission script for MPI parallel job","text":"

A simple MPI job submission script to submit a job using 4 compute nodes and 128 MPI ranks per node for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 512 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\n\nsrun --distribution=block:block --hint=nomultithread ./my_mpi_executable.x\n

This will run your executable \"my_mpi_executable.x\" in parallel on 512 MPI processes using 4 nodes (128 cores per node, i.e. not using hyper-threading). Slurm will allocate 4 nodes to your job and srun will place 128 MPI processes on each node (one per physical core).

See above for a more detailed discussion of the different sbatch options

"},{"location":"user-guide/scheduler/#example-job-submission-script-for-mpiopenmp-mixed-mode-parallel-job","title":"Example: job submission script for MPI+OpenMP (mixed mode) parallel job","text":"

Mixed mode codes that use both MPI (or another distributed memory parallel model) and OpenMP should take care to ensure that the shared memory portion of the process/thread placement does not span more than one NUMA region. Nodes on ARCHER2 are made up of two sockets each containing 4 NUMA regions of 16 cores, i.e. there are 8 NUMA regions in total. Therefore the total number of threads should ideally not be greater than 16, and also needs to be a factor of 16. Sensible choices for the number of threads are therefore 1 (single-threaded), 2, 4, 8, and 16. More information about using OpenMP and MPI+OpenMP can be found in the Tuning chapter.

To ensure correct placement of MPI processes the number of cpus-per-task needs to match the number of OpenMP threads, and the number of tasks-per-node should be set to ensure the entire node is filled with MPI tasks.

In the example below, we are using 4 nodes for 6 hours. There are 32 MPI processes in total (8 MPI processes per node) and 16 OpenMP threads per MPI process. This results in all 128 physical cores per node being used.

Hint

Note the use of the export OMP_PLACES=cores environment option to generate the correct thread pinning.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 16 and specify placement\n#   There are 16 OpenMP threads per MPI process\n#   We want one thread per physical core\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\n\n# Launch the parallel job\n#   Using 32 MPI processes\n#   8 MPI processes per node\n#   16 OpenMP threads per MPI process\n#   Additional srun options to pin one thread per physical core\nsrun --hint=nomultithread --distribution=block:block ./my_mixed_executable.x arg1 arg2\n
"},{"location":"user-guide/scheduler/#job-arrays","title":"Job arrays","text":"

The Slurm job scheduling system offers the job array concept, for running collections of almost-identical jobs. For example, running the same program several times with different arguments or input data.

Each job in a job array is called a subjob. The subjobs of a job array can be submitted and queried as a unit, making it easier and cleaner to handle the full set, compared to individual jobs.

All subjobs in a job array are started by running the same job script. The job script also contains information on the number of jobs to be started, and Slurm provides a subjob index which can be passed to the individual subjobs or used to select the input data per subjob.

"},{"location":"user-guide/scheduler/#job-script-for-a-job-array","title":"Job script for a job array","text":"

As an example, the following script runs 56 subjobs, with the subjob index as the only argument to the executable. Each subjob requests a single node and uses all 128 cores on the node by placing 1 MPI process per core and specifies 4 hours maximum runtime per subjob:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Array_Job\n#SBATCH --time=04:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --array=0-55\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nsrun --distribution=block:block --hint=nomultithread /path/to/exe $SLURM_ARRAY_TASK_ID\n
"},{"location":"user-guide/scheduler/#submitting-a-job-array","title":"Submitting a job array","text":"

Job arrays are submitted using sbatch in the same way as for standard jobs:

sbatch job_script.pbs\n
"},{"location":"user-guide/scheduler/#expressing-dependencies-between-jobs","title":"Expressing dependencies between jobs","text":"

SLURM allows one to express dependencies between jobs using the --dependency (or -d) option. This allows the start of execution of the dependent job to be delayed until some condition involving a current or previous job, or set of jobs, has been satisfied. A simple example might be:

$ sbatch --dependency=4394150 myscript.sh\nSubmitted batch job 4394325\n
This states that the execution of the new batch job should not start until job 4394150 has completed/terminated. Here, completion/termination is the only condition. The new job 4394325 should appear in the pending state with reason (Dependency) assuming 4394150 is still running.

A dependency may be of a different type, of which there are a number of relevant possibilities. If we explicitly include the default type afterany in the example above, we would have

$ sbatch --dependency=afterany:4394150 myscript.sh\nSubmitted batch job 4394325\n
This emphasises that the first job may complete with any exit code, and still satisfy the dependency. If we wanted a dependent job which would only become eligible for execution following successful completion of the dependency, we would use afterok:
$ sbatch --dependency=afterok:4394150 myscript.sh\nSubmitted batch job 4394325\n
This means that should the dependency fail with non-zero exit code, the dependent job will be in a state where it will never run. This may appear in squeue as (DependencyNeverSatisfied) as the reason. Such jobs will need to be cancelled.

The general form of the dependency list is <type:job_id[:job_id] [,type:job_id ...]> where a dependency may include one or more jobs, with one or more types. If a list is comma-separated, all the dependencies must be satisfied before the dependent job becomes eligible. The use of ? as the list separator implies that any of the dependencies is sufficient.

Useful type options include afterany, afterok, and afternotok. For the last case, the dependency is only satisfied if there is non-zero exit code (the opposite of afterok). See the current SLURM documentation for a full list of possibilities.

"},{"location":"user-guide/scheduler/#chains-of-jobs","title":"Chains of jobs","text":""},{"location":"user-guide/scheduler/#fixed-number-of-jobs","title":"Fixed number of jobs","text":"

Job dependencies can be used to construct complex pipelines or chain together long simulations requiring multiple steps.

For example, if we have just two jobs, the following shell script extract will submit the second dependent on the first, irrespective of actual job ID:

jobid=$(sbatch --parsable first_job.sh)\nsbatch --dependency=afterok:${jobid} second_job.sh\n
where we have used the --parsable option to sbatch to return just the new job ID (without the Submitted batch job).

This can be extended to a longer chain as required. E.g.:

jobid1=$(sbatch --parsable first_job.sh)\njobid2=$(sbatch --parsable --dependency=afterok:${jobid1} second_job.sh)\njobid3=$(sbatch --parsable --dependency=afterok:${jobid1} third_job.sh)\nsbatch --dependency=afterok:${jobid2},afterok:${jobid3} last_job.sh\n
Note jobs 2 and 3 are dependent on job 1 (only), but the final job is dependent on both jobs 2 and 3. This allows quite general workflows to be constructed.

"},{"location":"user-guide/scheduler/#number-of-jobs-not-known-in-advance","title":"Number of jobs not known in advance","text":"

This automation may be taken a step further to a case where a submission script propagates itself. E.g., a script might include, schematically,

#SBATCH ...\n\n# submit new job here ...\nsbatch --dependency=afterok:${SLURM_JOB_ID} thisscript.sh\n\n# perform work here...\nsrun ...\n
where the original submission of the script will submit a new instance of itself dependent on its own successful completion. This is done via the SLURM environment variable SLURM_JOB_ID which holds the id of the current job. One could defer the sbatch until the end of the script to avoid the dependency never being satisfied if the work associated with the srun fails. This approach can be useful in situations where, e.g., simulations with checkpoint/restart need to continue until some criterion is met. Some care may be required to ensure the script logic is correct in determining the criterion for stopping: it is best to start with a small/short test example. Incorrect logic and/or errors may lead to a rapid proliferation of submitted jobs.

Termination of such chains needs to be arranged either via appropriate logic in the script, or manual intervention to cancel pending jobs when no longer required.

"},{"location":"user-guide/scheduler/#using-multiple-srun-commands-in-a-single-job-script","title":"Using multiple srun commands in a single job script","text":"

You can use multiple srun commands within in a Slurm job submission script to allow you to use the resource requested more flexibly. For example, you could run a collection of smaller jobs within the requested resources or you could even subdivide nodes if your individual calculations do not scale up to use all 128 cores on a node.

In this guide we will cover two scenarios:

  1. Subdividing the job into multiple full-node or multi-node subjobs, e.g. requesting 100 nodes and running 100, 1-node subjobs or 50, 2-node subjobs.
  2. Subdividing the job into multiple subjobs that each use a fraction of a node, e.g. requesting 2 nodes and running 256, 1-core subjobs or 16, 16-core subjobs.
"},{"location":"user-guide/scheduler/#running-multiple-full-node-subjobs-within-a-larger-job","title":"Running multiple, full-node subjobs within a larger job","text":"

When subdivding a larger job into smaller subjobs you typically need to overwrite the --nodes option to srun and add the --ntasks option to ensure that each subjob runs on the correct number of nodes and that subjobs are placed correctly onto separate nodes.

For example, we will show how to request 100 nodes and then run 100 separate 1-node jobs, each of which use 128 MPI processes and which run on a different compute node. We start by showing the job script that would achieve this and then explain how this works and the options used. In our case, we will run 100 copies of the xthi program that prints the process placement on the node it is running on.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=multi_xthi\n#SBATCH --time=0:20:0\n#SBATCH --nodes=100\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module\nmodule load xthi\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Loop over 100 subjobs starting each of them on a separate node\nfor i in $(seq 1 100)\ndo\n# Launch this subjob on 1 node, note nodes and ntasks options and & to place subjob in the background\n    srun --nodes=1 --ntasks=128 --distribution=block:block --hint=nomultithread xthi > placement${i}.txt &\ndone\n# Wait for all background subjobs to finish\nwait\n

Key points from the example job script:

"},{"location":"user-guide/scheduler/#running-multiple-subjobs-that-each-use-a-fraction-of-a-node","title":"Running multiple subjobs that each use a fraction of a node","text":"

As the ARCHER2 nodes contain a large number of cores (128 per node) it may sometimes be useful to be able to run multiple executables on a single node. For example, you may want to run 128 copies of a serial executable or Python script; or, you may want to run multiple copies of parallel executables that use fewer than 128 cores each. This use model is possible using multiple srun commands in a job script on ARCHER2

Note

You can never share a compute node with another user. Although you can use srun to place multiple copies of an executable or script on a compute node, you still have exclusive use of that node. The minimum amount of resources you can reserve for your use on ARCHER2 is a single node.

When using srun to place multiple executables or scripts on a compute node you must be aware of a few things:

Below, we provide four examples or running multiple subjobs in a node: one that runs 128 serial processes across a single node; one that runs 8 subjobs each of which use 8 MPI processes with 2 OpenMP threads per MPI process; one that runs four inhomogeneous jobs, each of which requires a different number of MPI processes and OpenMP threads per process; and one that runs 256 serial processes across two nodes.

"},{"location":"user-guide/scheduler/#example-1-128-serial-tasks-running-on-a-single-node","title":"Example 1: 128 serial tasks running on a single node","text":"

For our first example, we will run 128 single-core copies of the xthi program (which prints process/thread placement) on a single ARCHER2 compute node with each copy of xthi pinned to a different core. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiSerialOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Loop over 128 subjobs pinning each to a different core\nfor i in $(seq 1 128)\ndo\n# Launch subjob overriding job settings as required and in the background\n# Make sure to change the amount specified by the `--mem=` flag to the amount\n# of memory required. The amount of memory is given in MiB by default but other\n# units can be specified. If you do not know how much memory to specify, we\n# recommend that you specify `--mem=1500M` (1,500 MiB).\nsrun --nodes=1 --ntasks=1 --ntasks-per-node=1 \\\n      --exact --mem=1500M xthi > placement${i}.txt &\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-2-8-subjobs-on-1-node-each-with-8-mpi-processes-and-2-openmp-threads-per-process","title":"Example 2: 8 subjobs on 1 node each with 8 MPI processes and 2 OpenMP threads per process","text":"

For our second example, we will run 8 subjobs, each running the xthi program (which prints process/thread placement) across 1 node. Each subjob will use 8 MPI processes and 2 OpenMP threads per process. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiParallelOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=64\n#SBATCH --cpus-per-task=2\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 2 as required by all subjobs\nexport OMP_NUM_THREADS=2\n\n# Loop over 8 subjobs\nfor i in $(seq 1 8)\ndo\n    echo $j $i\n    # Launch subjob overriding job settings as required and in the background\n    # Make sure to change the amount specified by the `--mem=` flag to the amount\n    # of memory required. The amount of memory is given in MiB by default but other\n    # units can be specified. If you do not know how much memory to specify, we\n    # recommend that you specify `--mem=12500M` (12,500 MiB).\n    srun --nodes=1 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=2 \\\n    --exact --mem=12500M xthi > placement${i}.txt &\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-3-running-inhomogeneous-subjobs-on-one-node","title":"Example 3: Running inhomogeneous subjobs on one node","text":"

For our third example, we will run 4 subjobs, each running the xthi program (which prints process/thread placement) across 1 node. Our subjobs will each run with a different number of MPI processes and OpenMP threads. We will run: one job with 64 MPI processes and 1 OpenMP process per thread; one job with 16 MPI processes and 2 threads per process; one job with 4 MPI processes and 4 OpenMP threads per job; and, one job with 1 MPI process and 16 OpenMP threads per job.

To be able to change the number of MPI processes and OpenMP threads per process, we will need to forgo using the #SBATCH --ntasks-per-node and the #SBATCH cpus-per-task commands -- if you set these Slurm will not let you alter the OMP_NUM_THREADS variable and you will not be able to change the number of OpenMP threads per process between each job.

Before each srun command, you will need to define the number of OpenMP threads per process you want by changing the OMP_NUM_THREADS variable. Furthermore, for each srun command, you will need to set the --ntasks flag to equal the number of MPI processes you want to use. You will also need to set the --cpus-per-task flag to equal the number of OpenMP threads per process you want to use.

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiParallelOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to value required by the first job\nexport OMP_NUM_THREADS=1\nsrun --ntasks=64 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=2\nsrun --ntasks=16 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=4\nsrun --ntasks=4 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=16\nsrun --ntasks=1 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-4-256-serial-tasks-running-across-two-nodes","title":"Example 4: 256 serial tasks running across two nodes","text":"

For our fourth example, we will run 256 single-core copies of the xthi program (which prints process/thread placement) across two ARCHER2 compute nodes with each copy of xthi pinned to a different core. We will illustrate a mechanism for getting the node IDs to pass to srun as this is required to ensure that the individual subjobs are assigned to the correct node. This mechanism uses the scontrol command to turn the nodelist from sbatch into a format we can use as input to srun. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiSerialOnComputes\n#SBATCH --time=0:10:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Get a list of the nodes assigned to this job in a format we can use.\n#   scontrol converts the condensed node IDs in the sbatch environment\n#   variable into a list of full node IDs that we can use with srun to\n#   ensure the subjobs are placed on the correct node. e.g. this converts\n#   \"nid[001234,002345]\" to \"nid001234 nid002345\"\nnodelist=$(scontrol show hostnames $SLURM_JOB_NODELIST)\n\n# Loop over the nodes assigned to the job\nfor nodeid in $nodelist\ndo\n    # Loop over 128 subjobs on each node pinning each to a different core\n    for i in $(seq 1 128)\n    do\n        # Launch subjob overriding job settings as required and in the background\n        # Make sure to change the amount specified by the `--mem=` flag to the amount\n        # of memory required. The amount of memory is given in MiB by default but other\n        # units can be specified. If you do not know how much memory to specify, we\n        # recommend that you specify `--mem=1500M` (1,500 MiB).\n        srun --nodelist=${nodeid} --nodes=1 --ntasks=1 --ntasks-per-node=1 \\\n        --exact --mem=1500M xthi > placement_${nodeid}_${i}.txt &\n    done\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#process-placement","title":"Process placement","text":"

There are many occasions where you may want to control (usually, MPI) process placement and change it from the default, for example:

There are a number of different methods for defining process placement, below we cover two different options: using Slurm options and using the MPICH_RANK_REORDER_METHOD environment variable. Most users will likely use the Slurm options approach.

"},{"location":"user-guide/scheduler/#standard-process-placement","title":"Standard process placement","text":"

The standard approach recommended on ARCHER2 is to place processes sequentially on nodes until the maximum number of tasks is reached. You can use the xthi program to verify this for MPI process placement:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170365\nsalloc: job 1170365 queued and waiting for resources\nsalloc: job 1170365 has been allocated resources\nsalloc: Granted job allocation 1170365\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002526-002527] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:block --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002526, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002527, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\n\n...output trimmed...\n\nNode    0, rank  124, thread   0, (affinity =  124)\nNode    0, rank  125, thread   0, (affinity =  125)\nNode    0, rank  126, thread   0, (affinity =  126)\nNode    0, rank  127, thread   0, (affinity =  127)\nNode    1, rank  128, thread   0, (affinity =    0)\nNode    1, rank  129, thread   0, (affinity =    1)\nNode    1, rank  130, thread   0, (affinity =    2)\nNode    1, rank  131, thread   0, (affinity =    3)\n\n...output trimmed...\n

Note

For MPI programs on ARCHER2, each rank corresponds to a process.

Important

To get good performance out of MPI collective operations, MPI processes should be placed sequentially on cores as in the standard placement described above.

"},{"location":"user-guide/scheduler/#setting-process-placement-using-slurm-options","title":"Setting process placement using Slurm options","text":""},{"location":"user-guide/scheduler/#for-underpopulation-of-nodes-with-processes","title":"For underpopulation of nodes with processes","text":"

When you are using fewer processes than cores on compute nodes (i.e. < 128 processes per node) the basic Slurm options (usually supplied in your script as options to sbatch) for process placement are:

In addition, the following options are added to your srun commands in your job submission script:

For example, to place 32 processes per node and have 1 process per 4-core block (corresponding to a CCX, Core CompleX, that shares an L3 cache), you would set:

Here is the output from xthi:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=32 \\\n     --cpus-per-task=4 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170383\nsalloc: job 1170383 queued and waiting for resources\nsalloc: job 1170383 has been allocated resources\nsalloc: Granted job allocation 1170383\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002526-002527] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:block --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002526, mpi  32, omp   1, executable xthi\nNode    1, hostname nid002527, mpi  32, omp   1, executable xthi\nMPI summary: 64 ranks\nNode    0, rank    0, thread   0, (affinity =  0-3)\nNode    0, rank    1, thread   0, (affinity =  4-7)\nNode    0, rank    2, thread   0, (affinity = 8-11)\nNode    0, rank    3, thread   0, (affinity = 12-15)\nNode    0, rank    4, thread   0, (affinity = 16-19)\nNode    0, rank    5, thread   0, (affinity = 20-23)\nNode    0, rank    6, thread   0, (affinity = 24-27)\nNode    0, rank    7, thread   0, (affinity = 28-31)\nNode    0, rank    8, thread   0, (affinity = 32-35)\nNode    0, rank    9, thread   0, (affinity = 36-39)\nNode    0, rank   10, thread   0, (affinity = 40-43)\nNode    0, rank   11, thread   0, (affinity = 44-47)\nNode    0, rank   12, thread   0, (affinity = 48-51)\nNode    0, rank   13, thread   0, (affinity = 52-55)\nNode    0, rank   14, thread   0, (affinity = 56-59)\nNode    0, rank   15, thread   0, (affinity = 60-63)\nNode    0, rank   16, thread   0, (affinity = 64-67)\nNode    0, rank   17, thread   0, (affinity = 68-71)\nNode    0, rank   18, thread   0, (affinity = 72-75)\nNode    0, rank   19, thread   0, (affinity = 76-79)\nNode    0, rank   20, thread   0, (affinity = 80-83)\nNode    0, rank   21, thread   0, (affinity = 84-87)\nNode    0, rank   22, thread   0, (affinity = 88-91)\nNode    0, rank   23, thread   0, (affinity = 92-95)\nNode    0, rank   24, thread   0, (affinity = 96-99)\nNode    0, rank   25, thread   0, (affinity = 100-103)\nNode    0, rank   26, thread   0, (affinity = 104-107)\nNode    0, rank   27, thread   0, (affinity = 108-111)\nNode    0, rank   28, thread   0, (affinity = 112-115)\nNode    0, rank   29, thread   0, (affinity = 116-119)\nNode    0, rank   30, thread   0, (affinity = 120-123)\nNode    0, rank   31, thread   0, (affinity = 124-127)\nNode    1, rank   32, thread   0, (affinity =  0-3)\nNode    1, rank   33, thread   0, (affinity =  4-7)\nNode    1, rank   34, thread   0, (affinity = 8-11)\nNode    1, rank   35, thread   0, (affinity = 12-15)\nNode    1, rank   36, thread   0, (affinity = 16-19)\nNode    1, rank   37, thread   0, (affinity = 20-23)\nNode    1, rank   38, thread   0, (affinity = 24-27)\nNode    1, rank   39, thread   0, (affinity = 28-31)\nNode    1, rank   40, thread   0, (affinity = 32-35)\nNode    1, rank   41, thread   0, (affinity = 36-39)\nNode    1, rank   42, thread   0, (affinity = 40-43)\nNode    1, rank   43, thread   0, (affinity = 44-47)\nNode    1, rank   44, thread   0, (affinity = 48-51)\nNode    1, rank   45, thread   0, (affinity = 52-55)\nNode    1, rank   46, thread   0, (affinity = 56-59)\nNode    1, rank   47, thread   0, (affinity = 60-63)\nNode    1, rank   48, thread   0, (affinity = 64-67)\nNode    1, rank   49, thread   0, (affinity = 68-71)\nNode    1, rank   50, thread   0, (affinity = 72-75)\nNode    1, rank   51, thread   0, (affinity = 76-79)\nNode    1, rank   52, thread   0, (affinity = 80-83)\nNode    1, rank   53, thread   0, (affinity = 84-87)\nNode    1, rank   54, thread   0, (affinity = 88-91)\nNode    1, rank   55, thread   0, (affinity = 92-95)\nNode    1, rank   56, thread   0, (affinity = 96-99)\nNode    1, rank   57, thread   0, (affinity = 100-103)\nNode    1, rank   58, thread   0, (affinity = 104-107)\nNode    1, rank   59, thread   0, (affinity = 108-111)\nNode    1, rank   60, thread   0, (affinity = 112-115)\nNode    1, rank   61, thread   0, (affinity = 116-119)\nNode    1, rank   62, thread   0, (affinity = 120-123)\nNode    1, rank   63, thread   0, (affinity = 124-127)\n

Tip

You usually only want to use physical cores on ARCHER2, so (ntasks-per-node) \u00d7 (cpus-per-task) should generally be equal to 128.

"},{"location":"user-guide/scheduler/#full-node-population-with-non-sequential-process-placement","title":"Full node population with non-sequential process placement","text":"

If you want to change the order processes are placed on nodes and cores using Slurm options then you should use the --distribution option to srun to change this.

For example, to place processes sequentially on nodes but round-robin on the 16-core NUMA regions in a single node, you would use the --distribution=block:cyclic option to srun. This type of process placement can be beneficial when a code is memory bound.

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170594\nsalloc: job 1170594 queued and waiting for resources\nsalloc: job 1170594 has been allocated resources\nsalloc: Granted job allocation 1170594\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002616,002621] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:cyclic --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =   16)\nNode    0, rank    2, thread   0, (affinity =   32)\nNode    0, rank    3, thread   0, (affinity =   48)\nNode    0, rank    4, thread   0, (affinity =   64)\nNode    0, rank    5, thread   0, (affinity =   80)\nNode    0, rank    6, thread   0, (affinity =   96)\nNode    0, rank    7, thread   0, (affinity =  112)\nNode    0, rank    8, thread   0, (affinity =    1)\nNode    0, rank    9, thread   0, (affinity =   17)\nNode    0, rank   10, thread   0, (affinity =   33)\nNode    0, rank   11, thread   0, (affinity =   49)\nNode    0, rank   12, thread   0, (affinity =   65)\nNode    0, rank   13, thread   0, (affinity =   81)\nNode    0, rank   14, thread   0, (affinity =   97)\nNode    0, rank   15, thread   0, (affinity =  113\n\n...output trimmed...\n\nNode    0, rank  120, thread   0, (affinity =   15)\nNode    0, rank  121, thread   0, (affinity =   31)\nNode    0, rank  122, thread   0, (affinity =   47)\nNode    0, rank  123, thread   0, (affinity =   63)\nNode    0, rank  124, thread   0, (affinity =   79)\nNode    0, rank  125, thread   0, (affinity =   95)\nNode    0, rank  126, thread   0, (affinity =  111)\nNode    0, rank  127, thread   0, (affinity =  127)\nNode    1, rank  128, thread   0, (affinity =    0)\nNode    1, rank  129, thread   0, (affinity =   16)\nNode    1, rank  130, thread   0, (affinity =   32)\nNode    1, rank  131, thread   0, (affinity =   48)\nNode    1, rank  132, thread   0, (affinity =   64)\nNode    1, rank  133, thread   0, (affinity =   80)\nNode    1, rank  134, thread   0, (affinity =   96)\nNode    1, rank  135, thread   0, (affinity =  112)\n\n...output trimmed...\n

If you wish to place processes round robin on both nodes and 16-core regions (cores that share access to a DRAM single memory controller) within in a node you would use --distribution=cyclic:cyclic:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170594\nsalloc: job 1170594 queued and waiting for resources\nsalloc: job 1170594 has been allocated resources\nsalloc: Granted job allocation 1170594\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002616,002621] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=cyclic:cyclic --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    2, thread   0, (affinity =   16)\nNode    0, rank    4, thread   0, (affinity =   32)\nNode    0, rank    6, thread   0, (affinity =   48)\nNode    0, rank    8, thread   0, (affinity =   64)\nNode    0, rank   10, thread   0, (affinity =   80)\nNode    0, rank   12, thread   0, (affinity =   96)\nNode    0, rank   14, thread   0, (affinity =  112)\nNode    0, rank   16, thread   0, (affinity =    1)\nNode    0, rank   18, thread   0, (affinity =   17)\nNode    0, rank   20, thread   0, (affinity =   33)\nNode    0, rank   22, thread   0, (affinity =   49)\nNode    0, rank   24, thread   0, (affinity =   65)\nNode    0, rank   26, thread   0, (affinity =   81)\nNode    0, rank   28, thread   0, (affinity =   97)\nNode    0, rank   30, thread   0, (affinity =  113)\n\n...output trimmed...\n\nNode    1, rank    1, thread   0, (affinity =    0)\nNode    1, rank    3, thread   0, (affinity =   16)\nNode    1, rank    5, thread   0, (affinity =   32)\nNode    1, rank    7, thread   0, (affinity =   48)\nNode    1, rank    9, thread   0, (affinity =   64)\nNode    1, rank   11, thread   0, (affinity =   80)\nNode    1, rank   13, thread   0, (affinity =   96)\nNode    1, rank   15, thread   0, (affinity =  112)\nNode    1, rank   17, thread   0, (affinity =    1)\nNode    1, rank   19, thread   0, (affinity =   17)\nNode    1, rank   21, thread   0, (affinity =   33)\nNode    1, rank   23, thread   0, (affinity =   49)\nNode    1, rank   25, thread   0, (affinity =   65)\nNode    1, rank   27, thread   0, (affinity =   81)\nNode    1, rank   29, thread   0, (affinity =   97)\nNode    1, rank   31, thread   0, (affinity =  113)\n\n...output trimmed...\n

Remember, MPI collective performance is generally much worse if processes are not placed sequentially on a node (so adjacent MPI ranks are as close to each other as possible). This is the reason that the default recommended placement on ARCHER2 is sequential rather than round-robin.

"},{"location":"user-guide/scheduler/#mpich_rank_reorder_method-for-mpi-process-placement","title":"MPICH_RANK_REORDER_METHOD for MPI process placement","text":"

The MPICH_RANK_REORDER_METHOD environment variable can also be used to specify other types of MPI task placement. For example, setting it to \"0\" results in a round-robin placement on both nodes and NUMA regions in a node (equivalent to the --distribution=cyclic:cyclic option to srun). Note, we do not specify the --distribution option to srun in this case as the environment variable is controlling placement:

salloc --nodes=8 --ntasks-per-node=2 --cpus-per-task=1 --time=0:10:0 --account=t01\n\nsalloc: Granted job allocation 24236\nsalloc: Waiting for resource configuration\nsalloc: Nodes cn13 are ready for job\n\nmodule load xthi\nexport OMP_NUM_THREADS=1\nexport MPICH_RANK_REORDER_METHOD=0\nsrun --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    2, thread   0, (affinity =   16)\nNode    0, rank    4, thread   0, (affinity =   32)\nNode    0, rank    6, thread   0, (affinity =   48)\nNode    0, rank    8, thread   0, (affinity =   64)\nNode    0, rank   10, thread   0, (affinity =   80)\nNode    0, rank   12, thread   0, (affinity =   96)\nNode    0, rank   14, thread   0, (affinity =  112)\nNode    0, rank   16, thread   0, (affinity =    1)\nNode    0, rank   18, thread   0, (affinity =   17)\nNode    0, rank   20, thread   0, (affinity =   33)\nNode    0, rank   22, thread   0, (affinity =   49)\nNode    0, rank   24, thread   0, (affinity =   65)\nNode    0, rank   26, thread   0, (affinity =   81)\nNode    0, rank   28, thread   0, (affinity =   97)\nNode    0, rank   30, thread   0, (affinity =  113)\n\n...output trimmed...\n

There are other modes available with the MPICH_RANK_REORDER_METHOD environment variable, including one which lets the user provide a file called MPICH_RANK_ORDER which contains a list of each task's placement on each node. These options are described in detail in the intro_mpi man page.

"},{"location":"user-guide/scheduler/#grid_order","title":"grid_order","text":"

For MPI applications which perform a large amount of nearest-neighbor communication, e.g., stencil-based applications on structured grids, HPE provide a tool in the perftools-base module (Loaded by default for all users) called grid_order which can generate a MPICH_RANK_ORDER file automatically by taking as parameters the dimensions of the grid, core count, etc. For example, to place 256 MPI parameters in row-major order on a Cartesian grid of size $(8, 8, 4)$, using 128 cores per node:

grid_order -R -c 128 -g 8,8,4\n\n# grid_order -R -Z -c 128 -g 8,8,4\n# Region 3: 0,0,1 (0..255)\n0,1,2,3,32,33,34,35,64,65,66,67,96,97,98,99,128,129,130,131,160,161,162,163,192,193,194,195,224,225,226,227,4,5,6,7,36,37,38,39,68,69,70,71,100,101,102,103,132,133,134,135,164,165,166,167,196,197,198,199,228,229,230,231,8,9,10,11,40,41,42,43,72,73,74,75,104,105,106,107,136,137,138,139,168,169,170,171,200,201,202,203,232,233,234,235,12,13,14,15,44,45,46,47,76,77,78,79,108,109,110,111,140,141,142,143,172,173,174,175,204,205,206,207,236,237,238,239\n16,17,18,19,48,49,50,51,80,81,82,83,112,113,114,115,144,145,146,147,176,177,178,179,208,209,210,211,240,241,242,243,20,21,22,23,52,53,54,55,84,85,86,87,116,117,118,119,148,149,150,151,180,181,182,183,212,213,214,215,244,245,246,247,24,25,26,27,56,57,58,59,88,89,90,91,120,121,122,123,152,153,154,155,184,185,186,187,216,217,218,219,248,249,250,251,28,29,30,31,60,61,62,63,92,93,94,95,124,125,126,127,156,157,158,159,188,189,190,191,220,221,222,223,252,253,254,255\n

One can then save this output to a file called MPICH_RANK_ORDER and then set MPICH_RANK_REORDER_METHOD=3 before running the job, which tells Cray MPI to read the MPICH_RANK_ORDER file to set the MPI task placement. For more information, please see the man page man grid_order.

"},{"location":"user-guide/scheduler/#interactive-jobs","title":"Interactive Jobs","text":""},{"location":"user-guide/scheduler/#using-salloc-to-reserve-resources","title":"Using salloc to reserve resources","text":"

When you are developing or debugging code you often want to run many short jobs with a small amount of editing the code between runs. This can be achieved by using the login nodes to run MPI but you may want to test on the compute nodes (e.g. you may want to test running on multiple nodes across the high performance interconnect). One of the best ways to achieve this on ARCHER2 is to use interactive jobs.

An interactive job allows you to issue srun commands directly from the command line without using a job submission script, and to see the output from your program directly in the terminal.

You use the salloc command to reserve compute nodes for interactive jobs.

To submit a request for an interactive job reserving 8 nodes (1024 physical cores) for 20 minutes on the short QoS you would issue the following command from the command line:

auser@ln01:> salloc --nodes=8 --ntasks-per-node=128 --cpus-per-task=1 \\\n                --time=00:20:00 --partition=standard --qos=short \\\n                --account=[budget code]\n

When you submit this job your terminal will display something like:

salloc: Granted job allocation 24236\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid000002 are ready for job\nauser@ln01:>\n

It may take some time for your interactive job to start. Once it runs you will enter a standard interactive terminal session (a new shell). Note that this shell is still on the front end (the prompt has not change). Whilst the interactive session lasts you will be able to run parallel jobs on the compute nodes by issuing the srun --distribution=block:block --hint=nomultithread command directly at your command prompt using the same syntax as you would inside a job script. The maximum number of nodes you can use is limited by resources requested in the salloc command.

Important

If you wish the cpus-per-task option to salloc to propagate to srun commands in the allocation, you will need to use the command export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK before you issue any srun commands.

If you know you will be doing a lot of intensive debugging you may find it useful to request an interactive session lasting the expected length of your working session, say a full day.

Your session will end when you hit the requested walltime. If you wish to finish before this you should use the exit command - this will return you to your prompt before you issued the salloc command.

"},{"location":"user-guide/scheduler/#using-srun-directly","title":"Using srun directly","text":"

A second way to run an interactive job is to use srun directly in the following way (here using the short QoS):

auser@ln01:/work/t01/t01/auser> srun --nodes=1 --exclusive --time=00:20:00 \\\n                --partition=standard --qos=short --account=[budget code] \\\n    --pty /bin/bash\nauser@nid001261:/work/t01/t01/auser> hostname\nnid001261\n

The --pty /bin/bash will cause a new shell to be started on the first node of a new allocation . This is perhaps closer to what many people consider an 'interactive' job than the method using salloc appears.

One can now issue shell commands in the usual way. A further invocation of srun is required to launch a parallel job in the allocation.

Note

When using srun within an interactive srun session, you will need to include both the --overlap and --oversubscribe flags, and specify the number of cores you want to use:

auser@nid001261:/work/t01/t01/auser> srun --overlap --oversubscribe --distribution=block:block \\\n                --hint=nomultithread --ntasks=128 ./my_mpi_executable.x\n

Without --overlap the second srun will block until the first one has completed. Since your interactive session was launched with srun this means it will never actually start -- you will get repeated warnings that \"Requested nodes are busy\".

When finished, type exit to relinquish the allocation and control will be returned to the front end.

By default, the interactive shell will retain the environment of the parent. If you want a clean shell, remember to specify --export=none.

"},{"location":"user-guide/scheduler/#heterogeneous-jobs","title":"Heterogeneous jobs","text":"

Most of the Slurm submissions discussed above involve running a single executable. However, there are situations where two or more distinct executables are coupled and need to be run at the same time, potentially using the same MPI communicator. This is most easily handled via the Slurm heterogeneous job mechanism.

Two common cases are discussed below: first, a client server model in which client and server each have a different MPI_COMM_WORLD, and second the case were two or more executables share MPI_COMM_WORLD.

"},{"location":"user-guide/scheduler/#heterogeneous-jobs-for-a-clientserver-model-distinct-mpi_comm_worlds","title":"Heterogeneous jobs for a client/server model: distinct MPI_COMM_WORLDs","text":"

The essential feature of a heterogeneous job here is to create a single batch submission which specifies the resource requirements for the individual components. Schematically, we would use

#!/bin/bash\n\n# Slurm specifications for the first component\n\n#SBATCH --partition=standard\n\n...\n\n#SBATCH hetjob\n\n# Slurm specifications for the second component\n\n#SBATCH --partition=standard\n\n...\n
where new each component beyond the first is introduced by the special token #SBATCH hetjob (note this is not a normal option and is not --hetjob). Each component must specify a partition.

Such a job will appear in the scheduler as, e.g.,

           50098+0  standard qscript-    user  PD       0:00      1 (None)\n           50098+1  standard qscript-    user  PD       0:00      2 (None)\n
and counts as (in this case) two separate jobs from the point of QoS limits.

Consider a case where we have two executables which may both be parallel (in that they use MPI), both run at the same time, and communicate with each other using MPI or by some other means. In the following example, we run two different executables, xthi-a and xthi-b, both of which must finish before the jobs completes.

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --export=none\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=8\n\n#SBATCH hetjob\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=4\n\n# Run two executables with separate MPI_COMM_WORLD\n\nsrun --distribution=block:block --hint=nomultithread --het-group=0 ./xthi-a &\nsrun --distribution=block:block --hint=nomultithread --het-group=1 ./xthi-b &\nwait\n
In this case, each executable is launched with a separate call to srun but specifies a different heterogeneous group via the --het-group option. The first group is --het-group=0. Both are run in the background with & and the wait is required to ensure both executables have completed before the job submission exits.

The above is a rather artificial example using two executables which are in fact just symbolic links in the job directory to xthi, used without loading the module. You can test this script yourself by creating symbolic links to the original executable before submitting the job:

auser@ln04:/work/t01/t01/auser/job-dir> module load xthi\nauser@ln04:/work/t01/t01/auser/job-dir> which xthi\n/work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi\nauser@ln04:/work/t01/t01/auser/job-dir> ln -s /work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi xthi-a\nauser@ln04:/work/t01/t01/auser/job-dir> ln -s /work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi xthi-b\n

The example job will produce two reports showing the placement of the MPI tasks from the two instances of xthi running in each of the heterogeneous groups. For example, the output might be

Node summary for    1 nodes:\nNode    0, hostname nid002400, mpi   8, omp   1, executable xthi-a\nMPI summary: 8 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode summary for    2 nodes:\nNode    0, hostname nid002146, mpi   4, omp   1, executable xthi-b\nNode    1, hostname nid002149, mpi   4, omp   1, executable xthi-b\nMPI summary: 8 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    1, rank    4, thread   0, (affinity =    0)\nNode    1, rank    5, thread   0, (affinity =    1)\nNode    1, rank    6, thread   0, (affinity =    2)\nNode    1, rank    7, thread   0, (affinity =    3)\n
Here we have the first executable running on one node with a communicator size 8 (ranks 0-7). The second executable runs on two nodes also with communicator size 8 (ranks 0-7, 4 ranks per node). Further examples of placement for heterogenenous jobs are given below.

Finally, if your workflow requires the different heterogeneous jobs to communicate via MPI, but without sharing their MPI_COM_WORLD, you will need to export two new variables before your srun commands as defined below:

export PMI_UNIVERSE_SIZE=3\nexport MPICH_SINGLE_HOST_ENABLED=0\n
"},{"location":"user-guide/scheduler/#heterogeneous-jobs-for-a-shared-mpi_com_world","title":"Heterogeneous jobs for a shared MPI_COM_WORLD","text":"

Note

The directive SBATCH hetjob can no longer be used for jobs requiring a shared MPI_COMM_WORLD

Note

In this approach, each hetjob component must be on its own set of nodes. You cannot use this approach to place different hetjob components on the same node.

If two or more heterogeneous components need to share a unique MPI_COMM_WORLD, a single srun invocation with the different components separated by a colon : should be used. Arguments to the individual components of the srun control the placement of the tasks and threads for each component. For example, running the same xthi-a and xthi-b executables as above but now in a shared communicator, we might run:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --export=none\n#SBATCH --account=[...]\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# We must specify correctly the total number of nodes required.\n#SBATCH --nodes=3\n\nSHARED_ARGS=\"--distribution=block:block --hint=nomultithread\"\n\nsrun --het-group=0 --nodes=1 --ntasks-per-node=8 ${SHARED_ARGS} ./xthi-a : \\\n --het-group=1 --nodes=2 --ntasks-per-node=4 ${SHARED_ARGS} ./xthi-b\n

The output should confirm we have a single MPI_COMM_WORLD with a total of three nodes, xthi-a running on one and xthi-b on two, with ranks 0-15 extending across both executables.

Node summary for    3 nodes:\nNode    0, hostname nid002668, mpi   8, omp   1, executable xthi-a\nNode    1, hostname nid002669, mpi   4, omp   1, executable xthi-b\nNode    2, hostname nid002670, mpi   4, omp   1, executable xthi-b\nMPI summary: 16 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    1, rank    8, thread   0, (affinity =    0)\nNode    1, rank    9, thread   0, (affinity =    1)\nNode    1, rank   10, thread   0, (affinity =    2)\nNode    1, rank   11, thread   0, (affinity =    3)\nNode    2, rank   12, thread   0, (affinity =    0)\nNode    2, rank   13, thread   0, (affinity =    1)\nNode    2, rank   14, thread   0, (affinity =    2)\nNode    2, rank   15, thread   0, (affinity =    3)\n
"},{"location":"user-guide/scheduler/#heterogeneous-placement-for-mixed-mpiopenmp-work","title":"Heterogeneous placement for mixed MPI/OpenMP work","text":"

Some care may be required for placement of tasks/threads in heterogeneous jobs in which the number of threads needs to be specified differently for different components.

In the following we have two components, again using xthi-a and xthi-b as our two separate executables. The first component runs 8 MPI tasks each with 16 OpenMP threads on one node. The second component runs 8 MPI tasks with one task per NUMA region on a second node; each task has one thread. An appropriate Slurm submission might be:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --export=none\n#SBATCH --account=[...]\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n\nSHARED_ARGS=\"--distribution=block:block --hint=nomultithread \\\n              --nodes=1 --ntasks-per-node=8 --cpus-per-task=16\"\n\n# Do not set OMP_NUM_THREADS in the calling environment\n\nunset OMP_NUM_THREADS\nexport OMP_PROC_BIND=spread\n\nsrun --het-group=0 ${SHARED_ARGS} --export=all,OMP_NUM_THREADS=16 ./xthi-a : \\\n      --het-group=1 ${SHARED_ARGS} --export=all,OMP_NUM_THREADS=1  ./xthi-b\n

The important point here is that OMP_NUM_THREADS must not be set in the environment that calls srun in order that the different specifications for the separate groups via --export on the srun command line take effect. If OMP_NUM_THREADS is set in the calling environment, then that value takes precedence, and each component will see the same value of OMP_NUM_THREADS.

The output might then be:

Node    0, hostname nid001111, mpi   8, omp  16, executable xthi-a\nNode    1, hostname nid001126, mpi   8, omp   1, executable xthi-b\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    0, thread   1, (affinity =    1)\nNode    0, rank    0, thread   2, (affinity =    2)\nNode    0, rank    0, thread   3, (affinity =    3)\nNode    0, rank    0, thread   4, (affinity =    4)\nNode    0, rank    0, thread   5, (affinity =    5)\nNode    0, rank    0, thread   6, (affinity =    6)\nNode    0, rank    0, thread   7, (affinity =    7)\nNode    0, rank    0, thread   8, (affinity =    8)\nNode    0, rank    0, thread   9, (affinity =    9)\nNode    0, rank    0, thread  10, (affinity =   10)\nNode    0, rank    0, thread  11, (affinity =   11)\nNode    0, rank    0, thread  12, (affinity =   12)\nNode    0, rank    0, thread  13, (affinity =   13)\nNode    0, rank    0, thread  14, (affinity =   14)\nNode    0, rank    0, thread  15, (affinity =   15)\nNode    0, rank    1, thread   0, (affinity =   16)\nNode    0, rank    1, thread   1, (affinity =   17)\n...\nNode    0, rank    7, thread  14, (affinity =  126)\nNode    0, rank    7, thread  15, (affinity =  127)\nNode    1, rank    8, thread   0, (affinity =    0)\nNode    1, rank    9, thread   0, (affinity =   16)\nNode    1, rank   10, thread   0, (affinity =   32)\nNode    1, rank   11, thread   0, (affinity =   48)\nNode    1, rank   12, thread   0, (affinity =   64)\nNode    1, rank   13, thread   0, (affinity =   80)\nNode    1, rank   14, thread   0, (affinity =   96)\nNode    1, rank   15, thread   0, (affinity =  112)\n

Here we can see the eight MPI tasks from xthi-a each running with sixteen OpenMP threads. Then the 8 MPI tasks with no threading from xthi-b are spaced across the cores on the second node, one per NUMA region.

"},{"location":"user-guide/scheduler/#low-priority-access","title":"Low priority access","text":"

Low priority jobs are not charged against your allocation but will only run when other, higher-priority, jobs cannot be run. Although low priority jobs are not charged, you do need a valid, positive budget to be able to submit and run low priority jobs, i.e. you need at least 1 CU in your budget.

Low priority access is always available and has the following limits:

You submit a low priority job on ARCHER2 by using the lowpriority QoS. For example, you would usually have the following line in your job submission script sbatch options:

#SBATCH --qos=lowpriority\n
"},{"location":"user-guide/scheduler/#reservations","title":"Reservations","text":"

Reservations are available on ARCHER2. These allow users to reserve a number of nodes for a specified length of time starting at a particular time on the system.

Reservations require justification. They will only be approved if the request could not be fulfilled with the normal QoS's. For instance, you require a job/jobs to run at a particular time e.g. for a demonstration or course.

Note

Reservation requests must be submitted at least 60 hours in advance of the reservation start time. If requesting a reservation for a Monday at 18:00, please ensure this is received by the Friday at 12:00 the latest. The same applies over Service Holidays.

Note

Reservations are only valid for standard compute nodes, high memory compute nodes and/or PP nodes cannot be included in reservations.

Reservations will be charged at 1.5 times the usual CU rate and our policy is that they will be charged the full rate for the entire reservation at the time of booking, whether or not you use the nodes for the full time. In addition, you will not be refunded the CUs if you fail to use them due to a job issue unless this issue is due to a system failure.

To request a reservation you complete a form on SAFE:

  1. Log into SAFE
  2. Under the \"Login accounts\" menu, choose the \"Request reservation\" option

On the first page, you need to provide the following:

On the second page, you will need to specify which username you wish the reservation to be charged against and, once the username has been selected, the budget you want to charge the reservation to. (The selected username will be charged for the reservation but the reservation can be used by all members of the selected budget.)

Your request will be checked by the ARCHER2 User Administration team and, if approved, you will be provided a reservation ID which can be used on the system. To submit jobs to a reservation, you need to add --reservation=<reservation ID> and --qos=reservation options to your job submission script or command.

Important

You must have at least 1 CU in the budget to submit a job on ARCHER2, even to a pre-paid reservation.

Tip

You can submit jobs to a reservation as soon as the reservation has been set up; jobs will remain queued until the reservation starts.

"},{"location":"user-guide/scheduler/#capability-days","title":"Capability Days","text":"

Important

The next Capability Days session will be from Tue 24 Sep 2024 to Thu 26 Sep 2024.

ARCHER2 Capability Days are a mechanism to allow users to run large scale (512 node or more) tests on the system free of charge. The motivations behind Capability Days are:

To enable this, a period will be made available regularly where users can run jobs at large scale free of charge.

Capability Days are made up of different parts:

Tip

Any jobs left in the queues when Capability Days finish will be deleted.

"},{"location":"user-guide/scheduler/#pre-capability-day-session","title":"pre-Capability Day session","text":"

The pre-Capability Day session is typically available directly before the full Capability Day session and allows short test jobs to prepare for Capability Day.

Submit to the pre-capabilityday QoS. Jobs can be submitted ahead of time and will start when the pre-Capability Day session starts.

pre-capabilityday QoS limits:

"},{"location":"user-guide/scheduler/#example-pre-capability-day-session-job-submission-script","title":"Example pre-Capability Day session job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=test_capability_job\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --qos=pre-capabilityday\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#nerc-capability-reservation","title":"NERC Capability reservation","text":"

The NERC Capability reservation is typically available directly before the full Capability Day session and allows short test jobs to prepare for Capability Day.

Submit to the NERCcapability reservation. Jobs can be submitted ahead of time and will start when the NERC Capability reservatoin starts.

NERCcapability reservation limits:

"},{"location":"user-guide/scheduler/#example-nerc-capability-reservation-job-submission-script","title":"Example NERC Capability reservation job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=NERC_capability_job\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --reservation=NERCcapability\n#SBATCH --qos=reservation\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#capability-day-session","title":"Capability Day session","text":"

The Capability Day session is typically available directly after the pre-Capability Day session.

Submit to the capability QoS. Jobs can be submitted ahead of time and will start when the Capability Day session starts.

capabilityday QoS limits:

"},{"location":"user-guide/scheduler/#example-capability-day-job-submission-script","title":"Example Capability Day job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=capability_job\n#SBATCH --nodes=1024\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --qos=capabilityday\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#capability-day-tips","title":"Capability Day tips","text":""},{"location":"user-guide/scheduler/#serial-jobs","title":"Serial jobs","text":"

You can run serial jobs on the shared data analysis nodes. More information on using the data analysis nodes (including example job submission scripts) can be found in the Data Analysis section of the User and Best Practice Guide.

"},{"location":"user-guide/scheduler/#gpu-jobs","title":"GPU jobs","text":"

You can run on the ARCHER2 GPU nodes and full guidance can be found on the GPU development platform page

"},{"location":"user-guide/scheduler/#best-practices-for-job-submission","title":"Best practices for job submission","text":"

This guidance is adapted from the advice provided by NERSC

"},{"location":"user-guide/scheduler/#time-limits","title":"Time Limits","text":"

Due to backfill scheduling, short and variable-length jobs generally start quickly resulting in much better job throughput. You can specify a minimum time for your job with the --time-min option to SBATCH:

#SBATCH --time-min=<lower_bound>\n#SBATCH --time=<upper_bound>\n

Within your job script, you can get the time remaining in the job with squeue -h -j ${Slurm_JOBID} -o %L to allow you to deal with potentially varying runtimes when using this option.

"},{"location":"user-guide/scheduler/#long-running-jobs","title":"Long Running Jobs","text":"

Simulations which must run for a long period of time achieve the best throughput when composed of many small jobs using a checkpoint and restart method chained together (see above for how to chain jobs together). However, this method does occur a startup and shutdown overhead for each job as the state is saved and loaded so you should experiment to find the best balance between runtime (long runtimes minimise the checkpoint/restart overheads) and throughput (short runtimes maximise throughput).

"},{"location":"user-guide/scheduler/#interconnect-locality","title":"Interconnect locality","text":"

For jobs which are sensitive to interconnect (MPI) performance and utilise 128 nodes or less it is possible to request that all nodes are in a single Slingshot dragonfly group. The maximum number of nodes in a group on ARCHER2 is 128.

Slurm has a concept of \"switches\" which on ARCHER2 are configured to map to Slingshot electrical groups; where all compute nodes have all-to-all electrical connections which minimises latency. Since this places an additional constraint on the scheduler a maximum time to wait for the requested topology can be specified - after this time, the job will be placed without the constraint.

For example, to specify that all requested nodes should come from one electrical group and to wait for up to 6 hours (360 minutes) for that placement, you would use the following option in your job:

#SBATCH --switches=1@360\n

You can request multiple groups using this option if you are using more nodes than are in a single group to maximise the number of nodes that share electrical connetions in the job. For example, to request 4 groups (maximum of 512 nodes) and have this as an absolute constraint with no timeout, you would use:

#SBATCH --switches=4\n

Danger

When specifying the number of groups take care to request enough groups to satisfy the requested number of nodes. If the number is too low then an unnecessary delay will be added due to the unsatisfiable request.

A useful heuristic to ensure this is the case is to ensure that the total nodes requested is less than or equal to the number of groups multiplied by 128.

"},{"location":"user-guide/scheduler/#large-jobs","title":"Large Jobs","text":"

Large jobs may take longer to start up. The sbcast command is recommended for large jobs requesting over 1500 MPI tasks. By default, Slurm reads the executable on the allocated compute nodes from the location where it is installed; this may take long time when the file system (where the executable resides) is slow or busy. The sbcast command, the executable can be copied to the /tmp directory on each of the compute nodes. Since /tmp is part of the memory on the compute nodes, it can speed up the job startup time.

sbcast --compress=none /path/to/exe /tmp/exe\nsrun /tmp/exe\n
"},{"location":"user-guide/scheduler/#huge-pages","title":"Huge pages","text":"

Huge pages are virtual memory pages which are bigger than the default page size of 4K bytes. Huge pages can improve memory performance for common access patterns on large data sets since it helps to reduce the number of virtual to physical address translations when compared to using the default 4KB.

To use huge pages for an application (with the 2 MB huge pages as an example):

module load craype-hugepages2M\ncc -o mycode.exe mycode.c\n

And also load the same huge pages module at runtime.

Warning

Due to the huge pages memory fragmentation issue, applications may get Cannot allocate memory warnings or errors when there are not enough hugepages on the compute node, such as:

libhugetlbfs [nid0000xx:xxxxx]: WARNING: New heap segment map at 0x10000000 failed: Cannot allocate memory``

By default, The verbosity level of libhugetlbfs HUGETLB_VERBOSE is set to 0 on ARCHER2 to suppress debugging messages. Users can adjust this value to obtain more information on huge pages use.

"},{"location":"user-guide/scheduler/#when-to-use-huge-pages","title":"When to Use Huge Pages","text":""},{"location":"user-guide/scheduler/#when-to-avoid-huge-pages","title":"When to Avoid Huge Pages","text":""},{"location":"user-guide/sw-environment-4cab/","title":"Software environment: 4-cabinet system","text":"

Important

This section covers the software environment on the initial, 4-cabinet ARCHER2 system. For docmentation on the software environment on the full ARCHER2 system, please see Software environment: full system.

The software environment on ARCHER2 is primarily controlled through the module command. By loading and switching software modules you control which software and versions are available to you.

Information

A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.

By default, all users on ARCHER2 start with the default software environment loaded.

Software modules on ARCHER2 are provided by both HPE Cray (usually known as the Cray Development Environment, CDE) and by EPCC, who provide the Service Provision, and Computational Science and Engineering services.

In this section, we provide:

"},{"location":"user-guide/sw-environment-4cab/#using-the-module-command","title":"Using the module command","text":"

We only cover basic usage of the module command here. For full documentation please see the Linux manual page on modules

The module command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:

These are described in more detail below.

"},{"location":"user-guide/sw-environment-4cab/#information-on-the-available-modules","title":"Information on the available modules","text":"

The module list command will give the names of the modules and their versions you have presently loaded in your environment:

auser@uan01:~> module list\nCurrently Loaded Modulefiles:\n1) cpe-aocc                          7) cray-dsmml/0.1.2(default)\n2) aocc/2.1.0.3(default)             8) perftools-base/20.09.0(default)\n3) craype/2.7.0(default)             9) xpmem/2.2.35-7.0.1.0_1.3__gd50fabf.shasta(default)\n4) craype-x86-rome                  10) cray-mpich/8.0.15(default)\n5) libfabric/1.11.0.0.233(default)  11) cray-libsci/20.08.1.2(default)\n6) craype-network-ofi\n

Finding out which software modules are available on the system is performed using the module avail command. To list all software modules available, use:

auser@uan01:~> module avail\n------------------------------- /opt/cray/pe/perftools/20.09.0/modulefiles --------------------------------\nperftools       perftools-lite-events  perftools-lite-hbm    perftools-nwpc     \nperftools-lite  perftools-lite-gpu     perftools-lite-loops  perftools-preload  \n\n---------------------------------- /opt/cray/pe/craype/2.7.0/modulefiles ----------------------------------\ncraype-hugepages1G  craype-hugepages8M   craype-hugepages128M  craype-network-ofi          \ncraype-hugepages2G  craype-hugepages16M  craype-hugepages256M  craype-network-slingshot10  \ncraype-hugepages2M  craype-hugepages32M  craype-hugepages512M  craype-x86-rome             \ncraype-hugepages4M  craype-hugepages64M  craype-network-none   \n\n------------------------------------- /usr/local/Modules/modulefiles --------------------------------------\ndot  module-git  module-info  modules  null  use.own  \n\n-------------------------------------- /opt/cray/pe/cpe-prgenv/7.0.0 --------------------------------------\ncpe-aocc  cpe-cray  cpe-gnu  \n\n-------------------------------------------- /opt/modulefiles ---------------------------------------------\naocc/2.1.0.3(default)  cray-R/4.0.2.0(default)  gcc/8.1.0  gcc/9.3.0  gcc/10.1.0(default)  \n\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\natp/3.7.4(default)              cray-mpich-abi/8.0.15             craype-dl-plugin-py3/20.06.1(default)  \ncce/10.0.3(default)             cray-mpich-ucx/8.0.15             craype/2.7.0(default)                  \ncray-ccdb/4.7.1(default)        cray-mpich/8.0.15(default)        craypkg-gen/1.3.10(default)            \ncray-cti/2.7.3(default)         cray-netcdf-hdf5parallel/4.7.4.0  gdb4hpc/4.7.3(default)                 \ncray-dsmml/0.1.2(default)       cray-netcdf/4.7.4.0               iobuf/2.0.10(default)                  \ncray-fftw/3.3.8.7(default)      cray-openshmemx/11.1.1(default)   papi/6.0.0.2(default)                  \ncray-ga/5.7.0.3                 cray-parallel-netcdf/1.12.1.0     perftools-base/20.09.0(default)        \ncray-hdf5-parallel/1.12.0.0     cray-pmi-lib/6.0.6(default)       valgrind4hpc/2.7.2(default)            \ncray-hdf5/1.12.0.0              cray-pmi/6.0.6(default)           \ncray-libsci/20.08.1.2(default)  cray-python/3.8.5.0(default)    \n

This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the HPE Cray FFTW library, use:

auser@uan01:~> module avail cray-fftw\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\ncray-fftw/3.3.8.7(default) \n

If you want more info on any of the modules, you can use the module help command:

auser@uan01:~> module help cray-fftw\n\n-------------------------------------------------------------------\nModule Specific Help for /opt/cray/pe/modulefiles/cray-fftw/3.3.8.7:\n\n\n===================================================================\nFFTW 3.3.8.7\n============\n  Release Date:\n  -------------\n    June 2020\n\n\n  Purpose:\n  --------\n    This Cray FFTW 3.3.8.7 release is supported on Cray Shasta Systems. \n    FFTW is supported on the host CPU but not on the accelerator of Cray systems.\n\n    The Cray FFTW 3.3.8.7 release provides the following:\n      - Optimizations for AMD Rome CPUs.\n    See the Product and OS Dependencies section for details\n\n[...]\n

The module show command reveals what operations the module actually performs to change your environment when it is loaded. We provide a brief overview of what the significance of these different settings mean below. For example, for the default FFTW module:

auser@uan01:~> module show cray-fftw\n-------------------------------------------------------------------\n/opt/cray/pe/modulefiles/cray-fftw/3.3.8.7:\n\nconflict        cray-fftw\nconflict        fftw\nsetenv          FFTW_VERSION 3.3.8.7\nsetenv          CRAY_FFTW_VERSION 3.3.8.7\nsetenv          CRAY_FFTW_PREFIX /opt/cray/pe/fftw/3.3.8.7/x86_rome\nsetenv          FFTW_ROOT /opt/cray/pe/fftw/3.3.8.7/x86_rome\nsetenv          FFTW_DIR /opt/cray/pe/fftw/3.3.8.7/x86_rome/lib\nsetenv          FFTW_INC /opt/cray/pe/fftw/3.3.8.7/x86_rome/include\nprepend-path    PATH /opt/cray/pe/fftw/3.3.8.7/x86_rome/bin\nprepend-path    MANPATH /opt/cray/pe/fftw/3.3.8.7/share/man\nprepend-path    CRAY_LD_LIBRARY_PATH /opt/cray/pe/fftw/3.3.8.7/x86_rome/lib\nprepend-path    PE_PKGCONFIG_PRODUCTS PE_FFTW\nsetenv          PE_FFTW_TARGET_x86_skylake x86_skylake\nsetenv          PE_FFTW_TARGET_x86_rome x86_rome\nsetenv          PE_FFTW_TARGET_x86_cascadelake x86_cascadelake\nsetenv          PE_FFTW_TARGET_x86_64 x86_64\nsetenv          PE_FFTW_TARGET_share share\nsetenv          PE_FFTW_TARGET_sandybridge sandybridge\nsetenv          PE_FFTW_TARGET_mic_knl mic_knl\nsetenv          PE_FFTW_TARGET_ivybridge ivybridge\nsetenv          PE_FFTW_TARGET_haswell haswell\nsetenv          PE_FFTW_TARGET_broadwell broadwell\nsetenv          PE_FFTW_VOLATILE_PKGCONFIG_PATH /opt/cray/pe/fftw/3.3.8.7/@PE_FFTW_TARGET@/lib/pkgconfig\nsetenv          PE_FFTW_PKGCONFIG_VARIABLES PE_FFTW_OMP_REQUIRES_@openmp@\nsetenv          PE_FFTW_OMP_REQUIRES { }\nsetenv          PE_FFTW_OMP_REQUIRES_openmp _mp\nsetenv          PE_FFTW_PKGCONFIG_LIBS fftw3_mpi:libfftw3_threads:fftw3:fftw3f_mpi:libfftw3f_threads:fftw3f\nmodule-whatis   {FFTW 3.3.8.7 - Fastest Fourier Transform in the West}\n  [...]\n
"},{"location":"user-guide/sw-environment-4cab/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"

To load a module to use the module load command. For example, to load the default version of HPE Cray FFTW into your environment, use:

auser@uan01:~> module load cray-fftw\n

Once you have done this, your environment will be setup to use the HPE Cray FFTW library. The above command will load the default version of HPE Cray FFTW. If you need a specific version of the software, you can add more information:

auser@uan01:~> module load cray-fftw/3.3.8.7\n

will load HPE Cray FFTW version 3.3.8.7 into your environment, regardless of the default.

If you want to remove software from your environment, module remove will remove a loaded module:

auser@uan01:~> module remove cray-fftw\n

will unload what ever version of cray-fftw (even if it is not the default) you might have loaded.

There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using module swap oldmodule newmodule.

Suppose you have loaded version 3.3.8.7 of cray-fftw, the following command will change to version 3.3.8.5:

auser@uan01:~> module swap cray-fftw cray-fftw/3.3.8.5\n

You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.

"},{"location":"user-guide/sw-environment-4cab/#changing-programming-environment","title":"Changing Programming Environment","text":"

The three programming environments PrgEnv-aocc, PrgEnv-cray, PrgEnv-gnu are implemented as module collections. The correct way to change programming environment, that is, change the collection of modules, is therefore via module restore. For example:

auser@uan01:~> module restore PrgEnv-gnu\n

!!! note there is only one argument, which is the collection to be restored. The command module restore will output a list of modules in the outgoing collection as they are unloaded, and the modules in the incoming collection as they are loaded. If you prefer not to have messages

auser@uan1:~> module -s restore PrgEnv-gnu\n

will suppress the messages. An attempt to restore a collection which is already loaded will result in no operation.

Module collections are stored in a user's home directory ${HOME}/.module. However, as the home directory is not available to the back end, module restore may fail for batch jobs. In this case, it is possible to restore one of the three standard programming environments via, e.g.,

module restore /etc/cray-pe.d/PrgEnv-gnu\n
"},{"location":"user-guide/sw-environment-4cab/#capturing-your-environment-for-reuse","title":"Capturing your environment for reuse","text":"

Sometimes it is useful to save the module environment that you are using to compile a piece of code or execute a piece of software. This is saved as a module collection. You can save a collection from your current environment by executing:

auser@uan01:~> module save [collection_name]\n

Note

If you do not specify the environment name, it is called default.

You can find the list of saved module environments by executing:

auser@uan01:~> module savelist\nNamed collection list:\n 1) default   2) PrgEnv-aocc   3) PrgEnv-cray   4) PrgEnv-gnu \n

To list the modules in a collection, you can execute, e.g.,:

auser@uan01:~> module saveshow PrgEnv-gnu\n-------------------------------------------------------------------\n/home/t01/t01/auser/.module/default:\nmodule use --append /opt/cray/pe/perftools/20.09.0/modulefiles\nmodule use --append /opt/cray/pe/craype/2.7.0/modulefiles\nmodule use --append /usr/local/Modules/modulefiles\nmodule use --append /opt/cray/pe/cpe-prgenv/7.0.0\nmodule use --append /opt/modulefiles\nmodule use --append /opt/cray/modulefiles\nmodule use --append /opt/cray/pe/modulefiles\nmodule use --append /opt/cray/pe/craype-targets/default/modulefiles\nmodule load cpe-gnu\nmodule load gcc\nmodule load craype\nmodule load craype-x86-rome\nmodule load --notuasked libfabric\nmodule load craype-network-ofi\nmodule load cray-dsmml\nmodule load perftools-base\nmodule load xpmem\nmodule load cray-mpich\nmodule load cray-libsci\nmodule load /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env\n

Note again that the details of the collection have been saved to the home directory (the first line of output above). It is possible to save a module collection with a fully qualified path, e.g.,

auser@uan1:~> module save /work/t01/z01/auser/.module/PrgEnv-gnu\n

which would make it available from the batch system.

To delete a module environment, you can execute:

auser@uan01:~> module saverm <environment_name>\n
"},{"location":"user-guide/sw-environment-4cab/#shell-environment-overview","title":"Shell environment overview","text":"

When you log in to ARCHER2, you are using the bash shell by default. As any other software, the bash shell has loaded a set of environment variables that can be listed by executing printenv or export.

The environment variables listed before are useful to define the behaviour of the software you run. For instance, OMP_NUM_THREADS define the number of threads.

To define an environment variable, you need to execute:

export OMP_NUM_THREADS=4\n

Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.

You can show the value of a specific environment variable if you print it:

echo $OMP_NUM_THREADS\n

Do not forget the dollar symbol. To remove an environment variable, just execute:

unset OMP_NUM_THREADS\n
"},{"location":"user-guide/sw-environment/","title":"Software environment","text":"

The software environment on ARCHER2 is managed using the Lmod software. Selecting which software is available in your environment is primarily controlled through the module command. By loading and switching software modules you control which software and versions are available to you.

Information

A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.

By default, all users on ARCHER2 start with the default software environment loaded.

Software modules on ARCHER2 are provided by both HPE (usually known as the HPE Cray Programming Environment, CPE) and by EPCC, who provide the Service Provision, and Computational Science and Engineering services.

In this section, we provide:

"},{"location":"user-guide/sw-environment/#using-the-module-command","title":"Using the module command","text":"

We only cover basic usage of the Lmod module command here. For full documentation please see the Lmod documentation

The module command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:

These are described in more detail below.

Tip

Lmod allows you to use the ml shortcut command. Without any arguments, ml behaves like module list; when a module name is specified to ml, ml behaves like module load.

Note

You will often have to include module commands in any job submission scripts to setup the software to use in your jobs. Generally, if you load modules in interactive sessions, these loaded modules do not carry over into any job submission scripts.

Important

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

"},{"location":"user-guide/sw-environment/#information-on-the-available-modules","title":"Information on the available modules","text":"

The key commands for getting information on modules are covered in more detail below. They are:

"},{"location":"user-guide/sw-environment/#module-list","title":"module list","text":"

The module list command will give the names of the modules and their versions you have presently loaded in your environment:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) cce/15.0.0             11) PrgEnv-cray/8.3.3\n  2) libfabric/1.12.1.2.2.0.0                7) craype/2.7.19          12) bolt/0.8\n  3) craype-network-ofi                      8) cray-dsmml/0.2.2       13) epcc-setup-env\n  4) perftools-base/22.12.0                  9) cray-mpich/8.1.23      14) load-epcc-module\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-libsci/22.12.1.1\n

All users start with a default set of modules loaded corresponding to:

"},{"location":"user-guide/sw-environment/#module-avail","title":"module avail","text":"

Finding out which software modules are currently available to load on the system is performed using the module avail command. To list all software modules currently available to load, use:

auser@uan01:~> module avail\n\n--------------------------- /work/y07/shared/archer2-lmod/utils/compiler/crayclang/10.0 ---------------------------\n   darshan/3.3.1\n\n------------------------------------ /work/y07/shared/archer2-lmod/python/core ------------------------------------\n   matplotlib/3.4.3    netcdf4/1.5.7    pytorch/1.10.0    scons/4.3.0    seaborn/0.11.2    tensorflow/2.7.0\n\n------------------------------------- /work/y07/shared/archer2-lmod/libs/core -------------------------------------\n   aocl/3.1     (D)    gmp/6.2.1            matio/1.5.23        parmetis/4.0.3        slepc/3.14.1\n   aocl/4.0            gsl/2.7              metis/5.1.0         petsc/3.14.2          slepc/3.18.3       (D)\n   boost/1.72.0        hypre/2.18.0         mkl/2023.0.0        petsc/3.18.5   (D)    superlu-dist/6.4.0\n   boost/1.81.0 (D)    hypre/2.25.0  (D)    mumps/5.3.5         scotch/6.1.0          superlu-dist/8.1.2 (D)\n   eigen/3.4.0         libxml2/2.9.7        mumps/5.5.1  (D)    scotch/7.0.3   (D)    superlu/5.2.2\n\n------------------------------------- /work/y07/shared/archer2-lmod/apps/core -------------------------------------\n   castep/22.11                    namd/2.14                 (D)    py-chemshell/21.0.3\n   code_saturne/7.0.1-cce15        nektar/5.2.0                     quantum_espresso/6.8  (D)\n   code_saturne/7.0.1-gcc11 (D)    nwchem/7.0.2                     quantum_espresso/7.1\n   cp2k/cp2k-2023.1                onetep/6.1.9.0-CCE-LibSci (D)    tcl-chemshell/3.7.1\n   elk/elk-7.2.42                  onetep/6.1.9.0-GCC-LibSci        vasp/5/5.4.4.pl2-vtst\n   fhiaims/210716.3                onetep/6.1.9.0-GCC-MKL           vasp/5/5.4.4.pl2\n   gromacs/2022.4+plumed           openfoam/com/v2106               vasp/6/6.3.2-vtst\n   gromacs/2022.4           (D)    openfoam/com/v2212        (D)    vasp/6/6.3.2          (D)\n   lammps/17Feb2023                openfoam/org/v9.20210903\n   namd/2.14-nosmp                 openfoam/org/v10.20230119 (D)\n\n------------------------------------ /work/y07/shared/archer2-lmod/utils/core -------------------------------------\n   amd-uprof/3.6.449          darshan-util/3.3.1        imagemagick/7.1.0         reframe/4.1.0\n   forge/24.0                 epcc-reframe/0.2          ncl/6.6.2                 tcl/8.6.13\n   bolt/0.7                   epcc-setup-env     (L)    nco/5.0.3          (D)    tk/8.6.13\n   bolt/0.8          (L,D)    gct/v6.2.20201212         nco/5.0.5                 usage-analysis/1.2\n   cdo/1.9.9rc1               genmaskcpu/1.0            ncview/2.1.7              visidata/2.1\n   cdo/2.1.1         (D)      gnuplot/5.4.2-simg        other-software/1.0        vmd/1.9.3-gcc10\n   cmake/3.18.4               gnuplot/5.4.2      (D)    paraview/5.9.1     (D)    xthi/1.3\n   cmake/3.21.3      (D)      gnuplot/5.4.3             paraview/5.10.1\n\n--------------------- /opt/cray/pe/lmod/modulefiles/mpi/crayclang/14.0/ofi/1.0/cray-mpich/8.0 ---------------------\n   cray-hdf5-parallel/1.12.2.1    cray-mpixlate/1.0.0.6    cray-parallel-netcdf/1.12.3.1\n\n--------------------------- /opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0 ---------------------------\n   cray-mpich-abi/8.1.23    cray-mpich/8.1.23 (L)\n\n...output trimmed...\n

This will list all the names and versions of the modules that you can currently load. Note that other modules may be defined but not available to you as they depend on modules you do not have loaded. Lmod only shows modules that you can currently load, not all those that are defined. You can search for modules that are not currently visble to you using the module spider command - we cover this in more detail below.

Note also, that not all modules may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the HPE Cray FFTW library, use:

auser@ln03:~>  module avail cray-fftw\n\n--------------------------------- /opt/cray/pe/lmod/modulefiles/cpu/x86-rome/1.0 ----------------------------------\n   cray-fftw/3.3.10.3\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n
"},{"location":"user-guide/sw-environment/#module-spider","title":"module spider","text":"

The module spider command is used to find out which modules are defined on the system. Unlike module avail, this includes modules that are not currently able to be loaded due to the fact you have not yet loaded dependencies to make them directly available.

module spider takes 3 forms:

If you cannot find a module that you expect to be on the system using module avail then you can use module spider to find out which dependencies you need to load to make the module available.

For example, the module cray-netcdf-hdf5parallel is installed on ARCHER2 but it will not be found by module avail:

auser@ln03:~> module avail cray-netcdf-hdf5parallel\nNo module(s) or extension(s) found!\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n

We can use module spider without any arguments to verify it exists and list the versions available:

auser@ln03:~> module spider\n\n-----------------------------------------------------------------------------------------------\nThe following is a list of the modules and extensions currently available:\n-----------------------------------------------------------------------------------------------\n\n...output trimmed...\n\n  cray-mpich-abi: cray-mpich-abi/8.1.23\n\n  cray-mpixlate: cray-mpixlate/1.0.0.6\n\n  cray-mrnet: cray-mrnet/5.0.4\n\n  cray-netcdf: cray-netcdf/4.9.0.1\n\n  cray-netcdf-hdf5parallel: cray-netcdf-hdf5parallel/4.9.0.1\n\n  cray-openshmemx: cray-openshmemx/11.5.7\n\n...output trimmed...\n

Now we know which versions are available, we can use module spider cray-netcdf-hdf5parallel/4.9.0.1 to find out how we can make it available:

auser@ln03:~> module spider module spider cray-netcdf-hdf5parallel/4.9.0.1\n\n---------------------------------------------------------------------------------------------------------------\n  cray-netcdf-hdf5parallel: cray-netcdf-hdf5parallel/4.9.0.1\n---------------------------------------------------------------------------------------------------------------\n\n    You will need to load all module(s) on any one of the lines below before the \"cray-netcdf-hdf5parallel/4.9.0.1\" module is available to load.\n\n      aocc/3.2.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      cce/15.0.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-none  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-ofi  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-ucx  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      gcc/10.3.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      gcc/11.2.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n\n    Help:\n      Release info:  /opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/release_info\n

There is a lot of information here, but what the output is essentailly telling us is that in order to have cray-netcdf-hdf5parallel/4.9.0.1 available to load we need to have loaded a compiler (any version of CCE, GCC or AOCC), an MPI library (any version of cray-mpich) and cray-hdf5-parallel loaded. As we always have a compiler and MPI library loaded, we can satisfy all of the dependencies by loading cray-hdf5-parallel, and then we can use module avail cray-netcdf-hdf5parallel again to show that the module is now available to load:

auser@ln03:~> module load cray-hdf5-parallel\nauser@ln03:~> module avail cray-netcdf-hdf5parallel\n\n--- /opt/cray/pe/lmod/modulefiles/hdf5-parallel/crayclang/14.0/ofi/1.0/cray-mpich/8.0/cray-hdf5-parallel/1.12.2 ---\n   cray-netcdf-hdf5parallel/4.9.0.1\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n
"},{"location":"user-guide/sw-environment/#module-help","title":"module help","text":"

If you want more info on any of the modules, you can use the module help command:

auser@ln03:~> module help gromacs\n
"},{"location":"user-guide/sw-environment/#module-show","title":"module show","text":"

The module show command reveals what operations the module actually performs to change your environment when it is loaded. For example, for the default FFTW module:

auser@ln03:~> module show gromacs\n\n  [...]\n
"},{"location":"user-guide/sw-environment/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"

To change your environment and make different software available you use the following commands which we cover in more detail below.

"},{"location":"user-guide/sw-environment/#module-load","title":"module load","text":"

To load a module to use the module load command. For example, to load the default version of GROMACS into your environment, use:

auser@ln03:~> module load gromacs\n

Once you have done this, your environment will be setup to use GROMACS. The above command will load the default version of GROMACS. If you need a specific version of the software, you can add more information:

auser@uan01:~> module load gromacs/2022.4 \n

will load GROMACS version 2022.4 into your environment, regardless of the default.

"},{"location":"user-guide/sw-environment/#module-remove","title":"module remove","text":"

If you want to remove software from your environment, module remove will remove a loaded module:

auser@uan01:~> module remove gromacs\n

will unload what ever version of gromacs you might have loaded (even if it is not the default).

"},{"location":"user-guide/sw-environment/#module-swap","title":"module swap","text":"

There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using module swap oldmodule newmodule.

For example, to swap from the default CCE (cray) compiler environment to the GCC (gnu) compiler environment, you would use:

auser@ln03:~> module swap PrgEnv-cray PrgEnv-gnu\n

You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.

"},{"location":"user-guide/sw-environment/#shell-environment-overview","title":"Shell environment overview","text":"

When you log in to ARCHER2, you are using the bash shell by default. As with any software, the bash shell has loaded a set of environment variables that can be listed by executing printenv or export.

The environment variables listed before are useful to define the behaviour of the software you run. For instance, OMP_NUM_THREADS define the number of threads.

To define an environment variable, you need to execute:

export OMP_NUM_THREADS=4\n

Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.

You can show the value of a specific environment variable if you print it:

echo $OMP_NUM_THREADS\n

Do not forget the dollar symbol. To remove an environment variable, just execute:

unset OMP_NUM_THREADS\n

Note that the dollar symbol is not included when you use the unset command.

"},{"location":"user-guide/sw-environment/#cgroup-control-of-login-resources","title":"cgroup control of login resources","text":"

Note that it not possible for a single user to monopolise the resources on a login node as this is controlled by cgroups. This means that a user cannot slow down the response time for other users.

"},{"location":"user-guide/tds/","title":"ARCHER2 Test and Development System (TDS) user notes","text":"

The ARCHER2 Test and Development System (TDS) is a small system used for testing changes before they are rolled out onto the full ARCHER2 system. This page contains useful information for people using the TDS on its configuration and what they can expect from the system.

Important

The TDS is used for testing on a day to day basis. This means that nodes and the entire system may be made unavailable or rebooted with little or no warning.

"},{"location":"user-guide/tds/#tds-system-details","title":"TDS system details","text":""},{"location":"user-guide/tds/#connecting-to-the-tds","title":"Connecting to the TDS","text":"

You can only log into the TDS from an ARCHER2 login node. You should create an SSH key pair on an ARCHER2 login node and add the public part to your ARCHER2 account in SAFE in the usual way.

Once your new key pair is setup, you can then login to the TDS (from an ARCHER2 login node) with

ssh login-tds.archer2.ac.uk\n

You will require your SSH key passphrase (for the new key pair you generated) and your usual ARCHER2 account password to login to the TDS.

"},{"location":"user-guide/tds/#slurm-scheduler-configuration","title":"Slurm scheduler configuration","text":""},{"location":"user-guide/tds/#known-issuesnotes","title":"Known issues/notes","text":""},{"location":"user-guide/tuning/","title":"Performance tuning","text":""},{"location":"user-guide/tuning/#mpi","title":"MPI","text":"

The vast majority of parallel scientific applications use the MPI library as the main way to implement parallelism; it is used so universally that the Cray compiler wrappers on ARCHER2 link to the Cray MPI library by default. Unlike other clusters you may have used, there is no choice of MPI library on ARCHER2: regardless of what compiler you are using, your program will use Cray MPI. This is because the Slingshot network on ARCHER2 is Cray-specific and significant effort has been put in by Cray software engineers to optimise the MPI performance on their Shasta systems.

Here we list a number of suggestions for improving the performance of your MPI programs on ARCHER2. Although MPI programs are capable of scaling very well due to the bespoke communications hardware and software, the details of how a program calls MPI can have significant effects on achieved performance.

Note

Many of these tips are actually quite generic and should be beneficial to any MPI program; however, they all become much more important when running on very large numbers of processes on a machine the size of ARCHER2.

"},{"location":"user-guide/tuning/#mpi-environment-variables","title":"MPI environment variables","text":"

There are a number of environment variables available to control aspects of MPI behavour on ARCHER2, the set of options can be displayed by running,

man intro_mpi\n
o n the ARCHER2 login nodes.

A couple of specific variables to highlight are MPICH_OFI_STARTUP_CONNECT and MPICH_OFI_RMA_STARTUP_CONNECT.

When using the default OFI transport layer the connections between ranks are set-up as they are required. This allows for good performance while reducing memory requirements. However for jobs using all-to-all communication it might be better to generate these connections in a coordinated way at the start of the application. To enable this set the following environment variable:

  export MPICH_OFI_STARTUP_CONNECT=1  \n

Additionally, RMA jobs requiring an all-to-all communication pattern on node it may be beneficial to set up the connections between processes on a node in a coordinated fashion:

  export MPICH_OFI_RMA_STARTUP_CONNECT=1\n

This option automatically enables MPICH_OFI_STARTUP_CONNECT.

"},{"location":"user-guide/tuning/#synchronous-vs-asynchronous-communications","title":"Synchronous vs asynchronous communications","text":""},{"location":"user-guide/tuning/#mpi_send","title":"MPI_Send","text":"

A standard way to send data in MPI is using MPI_Send (aptly called standard send). Somewhat confusingly, MPI is allowed to choose how to implement this in two different ways:

The rationale is that MPI, rather than the user, should decide how best to send a message.

In practice, what typically happens is that MPI tries to use an asynchronous approach via the eager protocol: the message is sent directly to a preallocated buffer on the receiver and the routine returns immediately afterwards. Clearly there is a limit on how much space can be reserved for this, so:

The threshold is often termed the eager limit which is fixed for the entire run of your program. It will have some default setting which varies from system to system, but might be around 8K bytes.

"},{"location":"user-guide/tuning/#implications","title":"Implications","text":""},{"location":"user-guide/tuning/#tuning-performance","title":"Tuning performance","text":"

With most MPI libraries you should be able to alter the default value of the eager limit at runtime, perhaps via an environment variable or a command-line argument to mpirun.

The advice for tuning the performance of MPI_Send is

Note

It cannot be stressed strongly enough that although the performance may be affected by the value of the eager limit, the functionality of your program should be unaffected. If changing the eager limit affects the correctness of your program (e.g. whether or not it deadlocks) then you have an incorrect MPI program.

"},{"location":"user-guide/tuning/#setting-the-eager-limit-on-archer2","title":"Setting the eager limit on ARCHER2","text":"

On ARCHER2, things are a little more complicated. Although the eager limit defaults to 16KiB, messages up to 256KiB are sent asynchronously because they are actually sent as a number of smaller messages.

To send even larger messages asynchronously, alter the value of FI_OFI_RXM_SAR_LIMIT in your job submission script, e.g. to set to 512KiB:

export FI_OFI_RXM_SAR_LIMIT=524288\n

You can also control the size of the smaller messages by altering the value of FI_OFI_RXM_BUFFER_SIZE in your job submission script, e.g. to set to 128KiB:

export FI_OFI_RXM_BUFFER_SIZE=131072\n

A different protocol is used for messages between two processes on the same node. The default eager limit for these is 8K. Although the performance of on-node messages is unlikely to be a limiting factor for your program you can change this value, e.g. to set to 16KiB:

export MPICH_SMP_SINGLE_COPY_SIZE=16384\n
"},{"location":"user-guide/tuning/#collective-operations","title":"Collective operations","text":"

Many of the collective operations that are commonly required by parallel scientific programs, i.e. operations that involve a group of processes, are already implemented in MPI. The canonical operation is perhaps adding up a double precision number across all MPI processes, which is best achieved by a reduction operation:

MPI_Allreduce(&x, &xsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);\n

This will be implemented using an efficient algorithm, for example based on a binary tree. Using such divide-and-conquer approaches typically results in an algorithm whose execution time on P processes scales as log_2(P); compare this to a naive approach where every process sends its input to rank 0 where the time will scale as P. This might not be significant on your laptop, but even on as few as 1000 processes the tree-based algorithm will already be around 100 times faster.

So, the basic advice is always use a collective routine to implement your communications pattern if at all possible.

In real MPI applications, collective operations are often called on a small amount of data, for example a global reduction of a single variable. In these cases, the time taken will be dominated by message latency and the first port of call when looking at performance optimisation is to call them as infrequently as possible!

Sometimes, the collective routines available may not appear to do exactly what you want. However, they can sometimes be used with a small amount of additional programming work:

Many MPI programs call MPI_Barrier to explicitly synchronise all the processes. Although this can be useful for getting reliable performance timings, it is rare in practice to find a program where the call is actually needed for correctness. For example, you may see:

// Ensure the input x is available on all processes\nMPI_Barrier(MPI_COMM_WORLD);\n// Perform a global reduction operation\nMPI_Allreduce(&x, &xsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);\n// Ensure the result xsum is available on all processes\nMPI_Barrier(MPI_COMM_WORLD);\n

Neither of these barriers are needed as the reduction operation performs all the required synchronisation.

If removing a barrier from your MPI code makes it run incorrectly, then this should ring alarm bells -- it is often a symptom of an underlying bug that is simply being masked by the barrier.

For example, if you use non-blocking calls such as MPI_Irecv then it is the programmer's responsibility to ensure that these are completed at some later point, for example by calling MPI_Wait on the returned request object. A common bug is to forget to do this, in which case you might be reading the contents of the receive buffer before the incoming message has arrived (e.g. if the sender is running late).

Calling a barrier may mask this bug as it will make all the processes wait for each other, perhaps allowing the late sender to catch up. However, this is not guaranteed so the real solution is to call the non-blocking communications correctly.

One of the few times when a barrier may be required is if processes are communicating with each other via some other non-MPI method, e.g. via the file system. If you want processes to sequentially open, append to, then close the same file then barriers are a simple way to achieve this:

for (i=0; i < size; i++)\n{\n  if (rank == i) append_data_to_file(data, filename);\n  MPI_Barrier(comm);\n}\n

but this is really something of a special case.

Global synchronisation may be required if you are using more advanced techniques such as hybrid MPI/OpenMP or single-sided MPI communication with put and get, but typically you should be using specialised routines such as MPI_Win_fence rather than MPI_Barrier.

Tip

If you run a performance profiler on your code and it shows a lot of time being spent in a collective operation such as MPI_Allreduce, this is not necessarily a sign that the reduction operation itself is the bottleneck. This is often a symptom of load imbalance: even if a reduction operation is efficiently implemented, it may take a long time to complete if the MPI processes do not all call it at the same time. MPI_Allreduce synchronises across processes so will have to wait for all the processes to call it before it can complete. A single slow process will therefore adversely impact the performance of your entire parallel program.

"},{"location":"user-guide/tuning/#openmp","title":"OpenMP","text":"

There are a variety of possible issues that can result in poor performance of OpenMP programs. These include:

"},{"location":"user-guide/tuning/#sequential-code","title":"Sequential code","text":"

Code outside of parallel regions is executed sequentially by the master thread.

"},{"location":"user-guide/tuning/#idle-threads","title":"Idle threads","text":"

If different threads have different amounts of computation to do, then threads may be idle whenever a barrier is encountered, for example at the end of parallel regions or the end of worksharing loops. For worksharing loops, choosing a suitable schedule kind may help. For more irregular computation patterns, using OpenMP tasks might offer a solution: the runtime will try to load balance tasks across the threads in the team.

Synchronisation mechanisms that enforce mutual exclusion, such as critical regions, atomic statements and locks can also result in idle threads if there is contention - threads have to wait their turn for access.

"},{"location":"user-guide/tuning/#synchronisation","title":"Synchronisation","text":"

The act of synchronising threads comes at some cost, even if the threads are never idle. In OpenMP, the most common source of synchronisation overheads is the implicit barriers at the end of parallel regions and worksharing loops. The overhead of these barriers depends on the OpenMP implementation being used as well as on the number of threads, but is typically in the range of a few microseconds. This means that for a simple parallel loop such as

#pragma omp parallel for reduction(+:sum)\nfor (i=0;i<n;i++){\n   sum += a[i];\n}\n

the number of iterations required to make parallel execution worthwhile may be of the order of 100,000. On ARCHER2, benchmarking has shown that for the AOCC compiler, OpenMP barriers have significantly higher overhead than for either the Cray or GNU compilers.

It is possible to suppress the implicit barrier at the end of worksharing loop using a nowait clause, taking care that this does not introduce and race conditions.

Atomic statements are designed to be capable of more efficient implementation that the equivalent critical region or lock/unlock pair, so should be used where applicable.

"},{"location":"user-guide/tuning/#scheduling","title":"Scheduling","text":"

Whenever we rely on the OpenMP runtime to dynamically assign computation to threads (e.g. dynamic or guided loop schedules, tasks), there is some overhead incurred (some of this cost may actually be internal synchronisation in the runtime). It is often necessary to adjust the granularity of the computation to find a compromise between too many small units (and high scheduling cost) and too few large units (where load imbalance may dominate). For example, we can choose a non-default chunksize for the dynamic schedule, or adjust the amount of computation within each OpenMP task construct.

"},{"location":"user-guide/tuning/#communication","title":"Communication","text":"

Communication between threads in OpenMP takes place via the cache coherency mechanism. In brief, whenever a thread writes a memory location, all copies of this location which are in a cache belonging to a different core have to be marked as invalid. Subsequent accesses to this location by other threads will result in the up-to-date value being retrieved from the cache where the last write occurred (or possibly from main memory).

Due to the fine granularity of memory accesses, these overheads are difficult to analyse or monitor. To minimise communication, we need to write code with good data affinity - i.e. each thread should access the same subset of program data as much as possible.

"},{"location":"user-guide/tuning/#numa-effects","title":"NUMA effects","text":"

On modern CPU nodes, main memory is often organised in NUMA regions - sections of main memory associated with a subset of the cores on a node. On ARCHER2 nodes, there are 8 NUMA regions per node, each associated with 16 CPU cores. On such systems the location of data in main memory with respect to the cores that are accessing it can be important. The default OS policy is to place data in the NUMA region which first accesses it (first touch policy). For OpenMP programs this can be the worst possible option: if the data is initialised by the master thread, it is all allocated one NUMA region and having all threads accessing data becomes a bandwidth bottleneck.

This default policy can be changed using the numactl command, but it is probably better to make use of the first touch policy by explicitly parallelising the data initialisation in the application code. This may be straightforward for large multidimensional arrays, but more challenging for irregular data structures.

"},{"location":"user-guide/tuning/#false-sharing","title":"False sharing","text":"

The cache coherency mechanism described above operates on units of data corresponding to the size of cache lines - for ARCHER2 CPUs this is 64 bytes. This means that if different threads are accessing neighbouring words in memory, and at least some of the accesses are writes, then communication may be happening even if no individual word is actually being accessed by more than one thread. This means that patterns such as

#pragma omp parallel shared(count) private(myid) \n{\n  myid = omp_get_thread_num();\n  ....\n  count[myid]++;\n  ....\n}\n

may give poor performance if the updates to the count array are sufficiently frequent.

"},{"location":"user-guide/tuning/#hardware-resource-contention","title":"Hardware resource contention","text":"

Whenever there are multiple threads (or processes) executing inside a node, they may contend for some hardware resources. The most important of these for many HPC applications is memory bandwidth. This is effect is very evident on ARCHER2 CPUs - it is possible for just 2 threads to almost saturate the available memory bandwidth in a NUMA region which has 16 cores associated with it. For very bandwidth-intensive applications, running more that 2 threads per NUMA region may gain little additional performance. If an OpenMP code is not using all the cores on a node, by default Slurm will spread the threads out across NUMA regions to maximise the available bandwidth.

Another resource that threads may contend for is space in shared caches. On ARCHER2, every set of 4 cores shares 16MB of L3 cache.

"},{"location":"user-guide/tuning/#compiler-non-optimisation","title":"Compiler non-optimisation","text":"

In rare cases, adding OpenMP directives can adversely affect the compiler's optimisation process. The symptom of this is that the OpenMP code running on 1 thread is slower than the same code compiled without the OpenMP flag. It can be difficult to find a workaround - using the compiler's diagnostic flags to find out which optimisation (e.g. vectorisation, loop unrolling) is being affected and adding compiler-specific directives may help.

"},{"location":"user-guide/tuning/#hybrid-mpi-and-openmp","title":"Hybrid MPI and OpenMP","text":"

There are two main motivations for using both MPI and OpenMP in the same application code: reducing memory requirements and improving performance. At low core counts, where the pure MPI version of the code is still scaling well, adding OpenMP is unlikely to improve performance. In fact, it can introduce some additional overheads which make performance worse! The benefit is likely to come in the regime where the pure MPI version starts to lose scalability - here adding OpenMP can reduce communication costs, make load balancing easier, or be an effective way of exploiting additional parallelism without excessive code re-writing.

An important performance consideration for MPI + OpenMP applications is the choice of the number of OpenMP threads per MPI process. The optimum value will depend on the application, the input data, the number of nodes requested and the choice of compiler, and is hard to predict without experimentation. However, there are some considerations that apply to ARCHER2:

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"ARCHER2 User Documentation","text":"

ARCHER2 is the next generation UK National Supercomputing Service. You can find more information on the service and the research it supports on the ARCHER2 website.

The ARCHER2 Service is a world class advanced computing resource for UK researchers. ARCHER2 is provided by UKRI, EPCC, Cray (an HPE company) and the University of Edinburgh.

"},{"location":"#what-the-documentation-covers","title":"What the documentation covers","text":"

This is the documentation for the ARCHER2 service and includes:

"},{"location":"#contributing-to-the-documentation","title":"Contributing to the documentation","text":"

The source for this documentation is publicly available in the ARCHER2 documentation Github repository so that anyone can contribute to improve the documentation for the service. Contributions can be in the form of improvements or addtions to the content and/or addtion of Issues providing suggestions for how it can be improved.

Full details of how to contribute can be found in the README.md file of the repository.

"},{"location":"#credits","title":"Credits","text":"

This documentation draws on the Cirrus Tier-2 HPC Documentation, Sheffield Iceberg Documentation and the ARCHER National Supercomputing Service Documentation.

"},{"location":"archer-migration/","title":"ARCHER to ARCHER2 migration","text":"

This section of the documentation is a guide for user migrating from ARCHER to ARCHER2.

It covers:

Tip

If you need help or have questions on ARCHER to ARCHER2 migration, please contact the ARCHER2 service desk

"},{"location":"archer-migration/account-migration/","title":"Migrating your account from ARCHER to ARCHER2","text":"

This section covers the following questions:

Tip

If you need help or have questions on ARCHER to ARCHER2 migration, please contact the ARCHER2 service desk

"},{"location":"archer-migration/account-migration/#when-will-i-be-able-to-access-archer2","title":"When will I be able to access ARCHER2?","text":"

We anticipate that users will have access during the week beginning 11th January 2021. Notification of activation of ARCHER2 projects will be sent to the project leaders/PIs and the project users.

"},{"location":"archer-migration/account-migration/#has-my-project-been-migrated-to-archer2","title":"Has my project been migrated to ARCHER2?","text":"

If you have an active ARCHER allocation at the end of the ARCHER service then your project will very likely be migrated to ARCHER2. If your project is migrated to ARCHER2 then it will have the same project code as it had on ARCHER.

Some further information that may be useful:

"},{"location":"archer-migration/account-migration/#how-much-resource-will-my-project-have-on-archer2","title":"How much resource will my project have on ARCHER2?","text":"

The unit of allocation on ARCHER2 is called the ARCHER2 Compute Unit (CU) and, in general, 1 CU will be worth 1 ARCHER2 node hour.

UKRI have determined the conversion rates which will be used to transfer existing ARCHER allocations onto ARCHER2. These will be:

In identifying these conversion rates UKRI has endeavoured to ensure that no user will be disadvantaged by the transfer of their allocation from ARCHER to ARCHER2.

A nominal allocation will be provided to all projects during the initial no-charging period. Users will be notified before the no-charging period ends.

When the ARCHER service ends, any unused ARCHER allocation in kAUs will be converted to ARCHER2 CUs and transferred to ARCHER2 project allocation.

"},{"location":"archer-migration/account-migration/#how-do-i-set-up-an-archer2-account","title":"How do I set up an ARCHER2 account?","text":"

Once you have been notified that you can go ahead and setup an ARCHER2 account you will do this through SAFE. Note that you should use the new unified SAFE interface rather than the ARCHER SAFE. The correct URL for the new SAFE is:

Your access details for this SAFE are the same as those for the ARCHER SAFE. You should log in in exactly the same way as you did on the ARCHER SAFE.

Important

You should make sure you request the same account name in your project on ARCHER2 as you have on ARCHER. This is to ensure that you have seamless access to your ARCHER /home data on ARCHER2. See the ARCHER to ARCHER2 Data Migration page for details on data transfer from ARCHER to ARCHER2

Once you have logged into SAFE, you will need to complete the following steps before you can log into ARCHER2 for the first time:

  1. Request an ARCHER2 account through SAFE
    1. See: How to request a machine account (SAFE documentation)
  2. (Optional) Create a new SSH key pair and add it to your ARCHER2 account in SAFE
    1. See: SSH key pairs (ARCHER2 documentation)
    2. If you do not add a new SSH key to your ARCHER2 account, then your account will use the same key as your ARCHER account
  3. Collect your initial, one-shot password from SAFE
    1. See: Intial passwords (ARCHER2 documentation)
"},{"location":"archer-migration/account-migration/#how-do-i-log-into-archer2-for-the-first-time","title":"How do I log into ARCHER2 for the first time?","text":"

The ARCHER2 documentation covers logging in to ARCHER from a variety of operating systems:

"},{"location":"archer-migration/archer2-differences/","title":"Main differences between ARCHER and ARCHER2","text":"

This section provides an overview of the main differences between ARCHER and ARCHER2 along with links to more information where appropriate.

"},{"location":"archer-migration/archer2-differences/#for-all-users","title":"For all users","text":""},{"location":"archer-migration/archer2-differences/#for-users-compiling-and-developing-software-on-archer2","title":"For users compiling and developing software on ARCHER2","text":""},{"location":"archer-migration/data-migration/","title":"Data migration from ARCHER to ARCHER2","text":"

This short guide explains how to move data from the ARCHER service to the ARCHER2 service.

We have also created a walkthrough video to guide you.

Note

This section assumes that you have an active ARCHER and ARCHER2 account, and that you have successfully logged in to both accounts.

Tip

Unlike normal access, ARCHER to ARCHER2 transfer has been set up to require only one form of authentication. You will not need to generate a new SSH key pair to transfer data from ARCHER to ARCHER2 as your password will suffice.

First, login to the ARCHER(1) (making sure to change auser to your username):

ssh auser@login.archer.ac.uk\n

Then, combine important research data into a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt directory1/\n

Please be selective -- the more data you want to transfer, the more time it will take.

From ARCHER in particular, in order to get the best transfer performance, we need to access a newer version of the SSH program. We do this by loading the openssh module:

module load openssh\n
"},{"location":"archer-migration/data-migration/#transferring-data-using-rsync-recommended","title":"Transferring data using rsync (recommended)","text":"

Begin the data transfer from ARCHER to ARCHER2 using rsync:

rsync -Pv -e\"ssh -c aes128-gcm@openssh.com\" \\\n       ./all_my_files.tar.gz a2user@transfer.dyn.archer2.ac.uk:/work/t01/t01/a2user\n

Important

Notice that the hostname for data transfer from ARCHER to ARCHER2 is not the usual login address. Instead, you use transfer.dyn.archer2.ac.uk. This address has been configured to allow higher performance data transfer and to allow access to ARCHER with password only with no SSH key required.

When running this command, you will be prompted to enter your ARCHER2 password. Enter it and the data transfer will begin. Also, remember to replace a2user with your ARCHER2 username, and t01 with the budget associated with that username.

The use of the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The -e flag allows specification of the ssh command - we have used this to add the location of the identity file. The -c option specifies the cipher to be used as aes128-gcm which has been found to increase performance. Unfortunately the ~ shortcut is not correctly expanded, so we have specified the full path. We move our research archive to our project work directory on ARCHER2.

"},{"location":"archer-migration/data-migration/#transferring-data-using-scp","title":"Transferring data using scp","text":"

If you are unconcerned about being able to restart an interrupted transfer, you could instead use the scp command,

scp -c aes128-gcm@openssh.com all_my_files.tar.gz \\\n    a2user@transfer.dyn.archer2.ac.uk:/work/t01/t01/a2user/\n

but rsync is recommended for larger transfers.

Important

Notice that the hostname for data transfer from ARCHER to ARCHER2 is not the usual login address. Instead, you use transfer.dyn.archer2.ac.uk. This address has been configured to allow higher performance data transfer and to allow access to ARCHER with password only with no SSH key required.

"},{"location":"archer2-migration/","title":"ARCHER2 4-cabinet system to ARCHER2 full system migration","text":"

This section of the documentation is a guide for user migrating from the ARCHER2 4-cabinet system to the ARCHER2 full system.

It covers:

Tip

If you need help or have questions on ARCHER2 4-cab to full ARCHER2 migration please contact the ARCHER2 service desk

"},{"location":"archer2-migration/account-migration/","title":"Accessing the ARCHER2 full system","text":"

This section covers the following questions:

Tip

If you need help or have questions on using ARCHER2 4-cabinet system and ARCHER2 full system please contact the ARCHER2 service desk

"},{"location":"archer2-migration/account-migration/#when-will-i-be-able-to-access-archer2-full-system","title":"When will I be able to access ARCHER2 full system?","text":"

We anticipate that users will have access from mid-late November. Users will have access to both the ARCHER2 4-cabinet system and ARCHER2 full system for at least 30 days. UKRI will confirm the dates and these will be communicated to you as they are confirmed. There will be at least 14 days notice before access to the ARCHER2 4-Cabinet system is removed.

"},{"location":"archer2-migration/account-migration/#has-my-project-been-enabled-on-archer2-full-system","title":"Has my project been enabled on ARCHER2 full system?","text":"

If you have an active ARCHER2 4-cabinet system allocation on 1st October 2021 then your project will be enabled on the ARCHER2 full system. The project code is the same on the full service as it is on ARCHER2 4-cabinet system.

Some further information that may be useful:

"},{"location":"archer2-migration/account-migration/#how-much-resource-will-my-project-have-on-archer2-full-system","title":"How much resource will my project have on ARCHER2 full system?","text":"

The unit of allocation on ARCHER2 is called the ARCHER2 Compute Unit (CU) and 1 CU is equivalent to 1 ARCHER2 node hour. Your time budget will be shared on both systems. This means that any existing allocation available to your project on the 4-cabinet system will also be available on the full system.

There will be a period of at least 30 days where users will have access to both the 4-cabinet system and the full system. During this time, use on the full system will be uncharged (though users must still have access to a valid, positive budget to be able to submit jobs) and use on the 4-cabinet system will be a charged in the usual way. Users will be notified before the no-charging period ends.

"},{"location":"archer2-migration/account-migration/#how-do-i-set-up-an-account-on-the-full-system","title":"How do I set up an account on the full system?","text":"

You will keep the same usernames, passwords and SSH keys that you use on the 4-cabinet system on the full system.

You do not need to do anything to enable your account, these will be made available automatically once access to the full system is available.

You will connect to the full system in the same way as you connect to the 4-cabinet system except for switching the ordering of the credentials:

"},{"location":"archer2-migration/account-migration/#how-do-i-log-into-the-different-archer2-systems","title":"How do I log into the different ARCHER2 systems?","text":"

The ARCHER2 documentation covers logging in to ARCHER2 from a variety of operating systems: - Logging in to ARCHER2 from macOS/Linux - Logging in to ARCHER2 from Windows

Login addresses:

Tip

When logging into the ARCHER2 full system for the first time, you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

"},{"location":"archer2-migration/account-migration/#what-will-happen-to-archer2-data","title":"What will happen to ARCHER2 data?","text":"

There are three file systems associated with the ARCHER2 Service:

"},{"location":"archer2-migration/account-migration/#home-file-systems","title":"home file systems","text":"

The home file systems will be mounted on both the 4-cabinet system and the full system; so users\u2019 directories are shared across the two systems. Users will be able to access the home file systems from both systems and no action is required to move data. The home file systems will be read and writeable on both services during the transition period.

"},{"location":"archer2-migration/account-migration/#work-file-systems","title":"work file systems","text":"

There are different work file systems for the 4-cabinet system and the full system.

The work file system on the 4-cabinet system (labelled \u201carcher2-4c-work\u201d in SAFE) will remain available on the 4-cabinet system during the transition period.

There will be new work file systems on the full system and you will have new directories on the new work file systems. Your initial quotas will typically be double your quotas for the 4-cabinet work file system.

Important: you are responsible for transferring any required data from the 4-cabinet work file systems to your new directories on the work file systems on the full system.

The work file system on the 4-cabinet system will be available for you to transfer your data from for at least 30 days from the start of the ARCHER2 full system access and 14 days notice will be given before the 4-cabinet work file system is removed.

"},{"location":"archer2-migration/account-migration/#rdfaas-file-systems","title":"RDFaaS file systems","text":"

For users who have access to the RDFaaS, your RDFaaS data will be available on both the 4-cabinet system and the full system during the transition period and will be readable and writeable on both systems.

"},{"location":"archer2-migration/archer2-differences/","title":"Main differences between ARCHER2 4-cabinet system and ARCHER2 full system","text":"

This section provides an overview of the main differences between the ARCHER2 4-cabinet system that all users have been using up until now and the full ARCHER2 system along with links to more information where appropriate.

"},{"location":"archer2-migration/archer2-differences/#for-all-users","title":"For all users","text":""},{"location":"archer2-migration/archer2-differences/#for-users-compiling-and-developing-software-on-archer2","title":"For users compiling and developing software on ARCHER2","text":""},{"location":"archer2-migration/data-migration/","title":"Data migration from the ARCHER2 4-cabinet system to the ARCHER2 full system","text":"

This short guide explains how to move data from from the work file system on the ARCHER2 4-cabinet system to the ARCHER2 full system. Your space on the home file system is shared between the ARCHER2 4-cabinet system and the ARCHER2 full system so everything from your home directory is already effectively transferred.

Note

This section assumes that you have an active ARCHER2 4-cabinet system and ARCHER2 full system account, and that you have successfully logged in to both accounts.

Tip

Unlike normal access, ARCHER2 4-cabinet system to ARCHER2 full system transfer has been set up to require only one form of authentication. You will only need one factor to authenticate from the 4-cab to the full system or vice versa. This factor can be either an SSH key (that has been registered against your account in SAFE) or you can use your passowrd. If you have a large amount of data to transfer you may want to setup a passphrase-less SSH key on ARCHER2 full system and use the data analysis nodes to run transfers via a Slurm job.

"},{"location":"archer2-migration/data-migration/#transferring-data-interactively-from-the-4-cabinet-system-to-the-full-system","title":"Transferring data interactively from the 4-cabinet system to the full system","text":"

First, login to the ARCHER2 4-cabinet system (making sure to change auser to your username):

ssh auser@login-4c.archer2.ac.uk\n

Then, combine important research data into a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt directory1/\n

Please be selective -- the more data you want to transfer, the more time it will take.

Unpack the archive file in the destination directory

tar -xzf all_my_files.tar.gz\n
"},{"location":"archer2-migration/data-migration/#transferring-data-using-rsync-recommended","title":"Transferring data using rsync (recommended)","text":"

Begin the data transfer from the ARCHER2 4-cabinet system to the ARCHER2 full system using rsync:

rsync -Pv all_my_files.tar.gz a2user@login.archer2.ac.uk:/work/t01/t01/a2user\n

When running this command, you will be prompted to enter your ARCHER2 password -- this is the same password for the ARCHER2 4-cabinet system and the ARCHER2 full system. Enter it and the data transfer will begin. Remember to replace a2user with your ARCHER2 username, and t01 with the budget associated with that username.

We use the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. We move our research archive to our project work directory on the ARCHER2 full system.

"},{"location":"archer2-migration/data-migration/#transferring-data-using-scp","title":"Transferring data using scp","text":"

If you are unconcerned about being able to restart an interrupted transfer, you could instead use the scp command,

scp all_my_files.tar.gz a2user@login.archer2.ac.uk:/work/t01/t01/a2user/\n

but rsync is recommended for larger transfers.

"},{"location":"archer2-migration/data-migration/#transferring-data-via-the-serial-queue","title":"Transferring data via the serial queue","text":"

It may be convenient to submit long data transfers to the serial queue. In this case, a number of simple preparatory steps are required to authenticate:

  1. On the full system, create a new ssh key pair without passphrase (just press return when prompted).
  2. Add the new public key to SAFE against your machine account.
  3. Use this key pair for ssh/scp commands in the serial queue to authenticate. As it has been arranged that only one of ssh key/password are required between the serial nodes and the 4-cabinet system, this is sufficient.

An example serial queue script using rsync might be:

#!/bin/bash\n\n# Slurm job options (job-name, job time)\n\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n#SBATCH --time=02:00:00\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code\n\n#SBATCH --account=[budget code] \n\n# Issue appropriate rsync command\n\nrsync -av --stats --progress --rsh=\"ssh -i ${HOME}/.ssh/id_rsa_batch\" \\\n      user-01@login-4c.archer2.ac.uk:/work/proj01/proj01/user-01/src \\\n      /work/proj01/proj01/user-01/destination\n
where ${HOME}/.ssh/id_rsa_batch is the new ssh key. Note that the ${HOME} directory is visible from the serial nodes on the full system, so ssh key pairs in ${HOME}/.ssh are available.

"},{"location":"archer2-migration/porting/","title":"Porting applications to full ARCHER2 system","text":"

Porting applications to the full ARCHER2 system has generally proven straightforward if they are running successfully on the ARCHER2 4-cabinet system. You should be able to use the same (or very similar) compile processes on the the full system as you used on ARCHER2.

During testing of the ARCHER2 full system, the CSE team at EPCC have seen that application binaries compiled on the 4-cabinet system can usually be copied over to the full system and work well and give good performance. However, if you run into issues with executables taken from the 4-cabinet system on the full system you should recompile in the first instance.

Information on compiling applications on the full system can be found in the Application Development Environment section of the User and Best Practice Guide.

"},{"location":"data-tools/","title":"Data Analysis and Tools","text":"

This section provides information on each of the centrally installed data analysis software and other software tools.

The tools currently available in this section are (software that is installed or maintained by third-parties rather than the ARCHER2 service are marked with *):

"},{"location":"data-tools/amd-uprof/","title":"AMD \u03bcProf","text":"

AMD \u03bcProf (\u201cMICRO-prof\u201d) is a software profiling analysis tool for x86 applications running on Windows, Linux and FreeBSD operating systems and provides event information unique to the AMD \u201cZen\u201d-based processors and AMD INSTINCT\u2122 MI Series accelerators. AMD uProf enables the developer to better understand the limiters of application performance and evaluate improvements.

"},{"location":"data-tools/amd-uprof/#accessing-amd-prof-on-archer2","title":"Accessing AMD \u03bcProf on ARCHER2","text":"

To gain access to the AMD\u03bcProf tools on ARCHER2, you must load the module:

module load amd-uprof\n
"},{"location":"data-tools/amd-uprof/#using-amd-prof","title":"Using AMD \u03bcProf","text":"

Please see the AMD documentation for information on how to use \u03bcProf:

"},{"location":"data-tools/cray-r/","title":"R","text":""},{"location":"data-tools/cray-r/#r-for-statistical-computing","title":"R for statistical computing","text":"

R is a software environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time-series analysis, classification, clustering, and so on).

Note

When you log onto ARCHER2, no R module is loaded by default. You need to load the cray-R module to access the functionality described below.

The recommended version of R to use on ARCHER2 is the HPE Cray R distribution, which can be loaded using:

module load cray-R\n

The HPE Cray R distribution includes a range of common R packages, including all of the base packages, plus a few others.

To see what packages are available, run the R command

library()\n

--from the R command prompt.

At the time of writing, the HPE Cray R distribution included the following packages:

Full System
Packages in library \u2018/opt/R/4.0.3.0/lib64/R/library\u2019:\n\nbase                    The R Base Package\nboot                    Bootstrap Functions (Originally by Angelo Canty\n                        for S)\nclass                   Functions for Classification\ncluster                 \"Finding Groups in Data\": Cluster Analysis\n                        Extended Rousseeuw et al.\ncodetools               Code Analysis Tools for R\ncompiler                The R Compiler Package\ndatasets                The R Datasets Package\nforeign                 Read Data Stored by 'Minitab', 'S', 'SAS',\n                        'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...\ngraphics                The R Graphics Package\ngrDevices               The R Graphics Devices and Support for Colours\n                        and Fonts\ngrid                    The Grid Graphics Package\nKernSmooth              Functions for Kernel Smoothing Supporting Wand\n                        & Jones (1995)\nlattice                 Trellis Graphics for R\nMASS                    Support Functions and Datasets for Venables and\n                        Ripley's MASS\nMatrix                  Sparse and Dense Matrix Classes and Methods\nmethods                 Formal Methods and Classes\nmgcv                    Mixed GAM Computation Vehicle with Automatic\n                        Smoothness Estimation\nnlme                    Linear and Nonlinear Mixed Effects Models\nnnet                    Feed-Forward Neural Networks and Multinomial\n                        Log-Linear Models\nparallel                Support for Parallel computation in R\nrpart                   Recursive Partitioning and Regression Trees\nspatial                 Functions for Kriging and Point Pattern\n                        Analysis\nsplines                 Regression Spline Functions and Classes\nstats                   The R Stats Package\nstats4                  Statistical Functions using S4 Classes\nsurvival                Survival Analysis\ntcltk                   Tcl/Tk Interface\ntools                   Tools for Package Development\nutils                   The R Utils Package\n
4-cabinet system
Packages in library \u2018/opt/R/4.0.2.0/lib64/R/library\u2019:\n\nbase                    The R Base Package\nboot                    Bootstrap Functions (Originally by Angelo Canty\n                        for S)\nclass                   Functions for Classification\ncluster                 \"Finding Groups in Data\": Cluster Analysis\n                        Extended Rousseeuw et al.\ncodetools               Code Analysis Tools for R\ncompiler                The R Compiler Package\ndatasets                The R Datasets Package\nforeign                 Read Data Stored by 'Minitab', 'S', 'SAS',\n                        'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...\ngraphics                The R Graphics Package\ngrDevices               The R Graphics Devices and Support for Colours\n                        and Fonts\ngrid                    The Grid Graphics Package\nKernSmooth              Functions for Kernel Smoothing Supporting Wand\n                        & Jones (1995)\nlattice                 Trellis Graphics for R\nMASS                    Support Functions and Datasets for Venables and\n                        Ripley's MASS\nMatrix                  Sparse and Dense Matrix Classes and Methods\nmethods                 Formal Methods and Classes\nmgcv                    Mixed GAM Computation Vehicle with Automatic\n                        Smoothness Estimation\nnlme                    Linear and Nonlinear Mixed Effects Models\nnnet                    Feed-Forward Neural Networks and Multinomial\n                        Log-Linear Models\nparallel                Support for Parallel computation in R\nrpart                   Recursive Partitioning and Regression Trees\nspatial                 Functions for Kriging and Point Pattern\n                        Analysis\nsplines                 Regression Spline Functions and Classes\nstats                   The R Stats Package\nstats4                  Statistical Functions using S4 Classes\nsurvival                Survival Analysis\ntcltk                   Tcl/Tk Interface\ntools                   Tools for Package Development\nutils                   The R Utils Package\n
"},{"location":"data-tools/cray-r/#running-r-on-the-compute-nodes","title":"Running R on the compute nodes","text":"

In this section, we provide an example R job submission scripts for using R on the ARCHER2 compute nodes.

"},{"location":"data-tools/cray-r/#serial-r-submission-script","title":"Serial R submission script","text":"
#!/bin/bash --login\n\n#SBATCH --job-name=r_test\n#SBATCH --ntasks=1\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g., t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Load the R module\nmodule load cray-R\n\n# Run your R progamme\nRscript serial_test.R\n

On completion, the output of the R script will be available in the job output file.

"},{"location":"data-tools/darshan/","title":"Darshan","text":"

Darshan is a scalable HPC I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with minimum overhead. The name is taken from a Sanskrit word for \"sight\" or \"vision\".

Darshan is developed at the Argonne Leadership Computing Facility (ALCF)

Useful links:

"},{"location":"data-tools/darshan/#using-darshan-on-archer2","title":"Using Darshan on ARCHER2","text":"

Using Darshan generally consists of two stages:

  1. Collect IO profile data using the Darshan runtime
  2. Analysing Darshan log files using Darshan utility software
"},{"location":"data-tools/darshan/#collecting-io-profile-data","title":"Collecting IO profile data","text":"

To collect IO profile data you add the command:

module load darshan\n

to your job submission script as the last module command before you run your program. As Darshan does not distinguish between different software run in your job submission script, we typically recommand that you use a structure like:

module load darshan\nsrun ...usual software launch options...\nmodule remove darshan\n

This will avoid Darshan profiling IO for operations that are not part of your main parallel program.

Tip

There may be some periods when Darshan monitoring is enabled by default for all users. During these periods, you can disable Darshan monitoring by adding the command module remove darshan to your job submission script. Periods of Darshan monitoring will be noted on the ARCHER2 Service Status page.

Important

The darshan module is dependent on the compiler environment you are using and you should ensure that you load the darshan module that matches the compiler environment you used to compile the program you are analysing. For example, if your software was compiled using PrgEnv-gnu, then you would need to activate the GCC compiler environment before loading the darshan module to ensure you get the GCC version of Darshan. This means loading the correct PrgEnv- module before you load the darshan module:

module load PrgEnv-gnu\nmodule load darshan\nsrun ...usual software launch options...\nmodule remove darshan\n
"},{"location":"data-tools/darshan/#location-of-darshan-profile-logs","title":"Location of Darshan profile logs","text":"

Darshan writes all profile logs to a shared location on the ARCHER2 NVMe Lustre file system. You can find your profile logs at:

/mnt/lustre/a2fs-nvme/system/darshan/YYYY/MM/DD\n

where YYYY/MM/DD correspond to the date on which your job ran.

"},{"location":"data-tools/darshan/#analysing-darshan-profile-logs","title":"Analysing Darshan profile logs","text":"

The simplest way to analyse the profile log files is to use the darshan-parser utility on the ARCHER2 login nodes. You make the Darshan analysis utilities available with the command:

module load darshan-util\n

Once this is loaded, you can produce and IO performance summary from a profile log file with:

darshan-parser --perf /path/to/darshan/log/file.darshan\n

You can get a dump of all data in the Darshan profile log by omitting the --perf option, e.g.:

darshan-parser /path/to/darshan/log/file.darshan\n

Tip

The darshan-job-summary.pl and darshan-summary-per-file.sh utilities do not work on ARCHER2 as the required graphical packages are not currently available.

Documentation on the Darshan analysis utilities are available at:

"},{"location":"data-tools/forge/","title":"Linaro Forge","text":""},{"location":"data-tools/forge/#linaro-forge","title":"Linaro Forge","text":"

Linaro Forge provides debugging and profiling tools for MPI parallel applications, and OpenMP or pthreads multi-threaded applications (and also hydrid MPI/OpenMP). Forge DDT is the debugger and MAP is the profiler.

"},{"location":"data-tools/forge/#user-interface","title":"User interface","text":"

There are two ways of running the Forge user interface. If you have a good internet connection to ARCHER2, the GUI can be run on the front-end (with an X-connection). Alternatively, one can download a copy of the Forge remote client to your laptop or desktop, and run it locally. The remote client should be used if at all possible.

To download the remote client, see the Forge download pages. Version 24.0 is known to work at the time of writing. A section further down this page explains how to use the remote client, see Connecting with the remote client.

"},{"location":"data-tools/forge/#licensing","title":"Licensing","text":"

ARCHER2 has a licence for up to 2080 tokens, where a token represents an MPI parallel process. Running Forge DDT/MAP to debug/profile a code running across 16 nodes using 128 MPI ranks per node would require 2048 tokens. If you wish to run on more nodes, say 32, then it will be necessary to reduce the number of tasks per node so as to fall below the maximum number of tokens allowed.

Please note, Forge licence tokens are shared by all ARCHER2 (and Cirrus) users.

To see how many tokens are in use, you can view the licence server status page by first setting up an SSH tunnel to the node hosting the licence server.

ssh <username>@login.archer2.ac.uk -L 4241:dvn04:4241\n

You can now view the status page from within a local browser, see http://localhost:4241/status.html.

Note

The licence status page may contain multiple licences, indicated by a row of buttons (one per licence) near the top of the page. The details of the 12-month licence described above can be accessed by clicking on the first button in the row. Additional buttons may appear at various times for boosted licences: once a quarter, ARCHER2 will have a boosted 7-day licence offering 8192 tokens, sufficient for 64 nodes running 128 MPI ranks per node. Please contact the Service Desk if you have a specific requirement that exceeds the current Forge licence provision.

Note

The licence status page refers to the Arm Licence Server. Arm is the name of the company that originally developed Forge before it was acquired by Linaro.

"},{"location":"data-tools/forge/#one-time-set-up-for-using-forge","title":"One time set-up for using Forge","text":"

A preliminary step is required to set up the necessary Forge configuration files that allow DDT and MAP to initialise its environment correctly so that it can, for example, interact with the Slurm queue system. These steps should be performed in the /work file system on ARCHER2.

It is recommended that these commands are performed in the top-level work file system directory for the user account, i.e., ${HOME/home/work}.

module load forge\ncd ${HOME/home/work}\nsource ${FORGE_DIR}/config-init\n

Running the source command will create a directory ${HOME/home/work}/.forge that contains the following files.

system.config  user.config\n

Warning

The config-init script may output, Warning: failed to read system config. Please ignore as subsequent messages should indicate that the new configuration files have been created.

Within the system.config file you should find that shared directory is set to the equivalent of ${HOME/home/work/.forge}. That directory will also store other relevant files when Forge is run.

"},{"location":"data-tools/forge/#using-ddt","title":"Using DDT","text":"

DDT (Distributed Debugging Tool) provides an easy-to-use graphical interface for source-level debugging of compiled C/C++ or Fortran codes. It can be used for non-interactive debugging, and there is also some limited support for python debugging.

"},{"location":"data-tools/forge/#preparation","title":"Preparation","text":"

To prepare your program for debugging, compile and link in the normal way but remember to include the -g compiler option to retain symbolic information in the executable. For some programs, it may be necessary to reduce the optimisation to -O0 to obtain full and consistent information. However, this in itself can change the behaviour of bugs, so some experimentation may be necessary.

"},{"location":"data-tools/forge/#post-mortem-debugging","title":"Post-mortem debugging","text":"

A non-interactive method of debugging is available which allows information to be obtained on the state of the execution at the point of failure in a batch job.

Such a job can be submitted to the batch system in the usual way. The relevant command to start the executable is as follows.

# ... Slurm batch commands as usual ...\n\nmodule load forge\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nddt --verbose --offline --mpi=slurm --np 8 \\\n    --mem-debug=fast --check-bounds=before \\\n    ./my_executable\n

The parallel launch is delegated to ddt and the --mpi=slurm option indicates to ddt that the relevant queue system is Slurm (there is no explicit srun). It will also be necessary to state explicitly to ddt the number of processes required (here --np 8). For other options see, e.g., ddt --help.

Note that higher levels of memory debugging can result in extremely slow execution. The example given above uses the default --mem-debug=fast which should be a reasonable first choice.

Execution will produce a .html format report which can be used to examine the state of execution at the point of failure.

"},{"location":"data-tools/forge/#interactive-debugging-using-the-client-to-submit-a-batch-job","title":"Interactive debugging: using the client to submit a batch job","text":"

You can also start the client interactively (for details of remote launch, see Connecting with the remote client).

module load forge\nddt\n

This should start a window as shown below. Click on the DDT panel on the left, and then on the Run and debug a program option. This will bring up the Run dialogue as shown.

Note:

In the Application sub panel of the Run dialog box, details of the executable, command line arguments or data files, the working directory and so on should be entered.

Click the MPI checkbox and specify the MPI implementation. This is done by clicking the Details button and then the Change button. Choose the SLURM (generic) implementation from the drop-down menu and click OK. You can then specify the required number of nodes/processes and so on.

Click the OpenMP checkbox and select the relevant number of threads (if there is no OpenMP in the application itself, select 1 thread).

Click the Submit to Queue checkbox and then the associated Configure button. A new set of options will appear such as Submission template file, where you can enter ${FORGE_DIR}/templates/archer2.qtf and click OK. This template file provides many of the options required for a standard batch job. You will then need to click on the Queue Parameters button in the same section and specify the relevant project budget, see the Account entry.

The default queue template file configuration uses the short QoS with the standard time limit of 20 minutes. If something different is required, one can edit the settings. Alternatively, one can copy the archer2.qtf file (to ${HOME/home/work}/.forge) and make the relevant changes. This new template file can then be specified in the dialog window.

There may be a short delay while the sbatch job starts. Debugging should then proceed as described in the Linaro Forge documentation.

"},{"location":"data-tools/forge/#using-map","title":"Using MAP","text":"

Load the forge module:

module load forge\n
"},{"location":"data-tools/forge/#linking","title":"Linking","text":"

MAP uses two small libraries to collect data from your program. These are called map-sampler and map-sampler-pmpi. On ARCHER2, the linking of these libraries is usually done automatically via the LD_PRELOAD mechanism, but only if your program is dynamically linked. Otherwise, you will need to link the MAP libraries manually by providing explicit link options.

The library paths specified in the link options will depend on the programming environment you are using as well as the Cray programming release. Here are the paths for each of the compiler environments consistent with the Cray Programming Release (CPE) 22.12 using the default OFI as the low-level comms protocol:

For example, for PrgEnv-gnu the additional options required at link time are given below.

-L${FORGE_DIR}/map/libs/default/gnu/ofi \\\n-lmap-sampler-pmpi -lmap-sampler \\\n-Wl,--eh-frame-hdr -Wl,-rpath=${FORGE_DIR}/map/libs/default/gnu/ofi\n

The MAP libraries for other Cray programming releases can be found under ${FORGE_DIR}/map/libs. If you require MAP libraries built for the UCX comms protocol, simply replace ofi with ucx in the library path.

"},{"location":"data-tools/forge/#generating-a-profile","title":"Generating a profile","text":"

Submit a batch job in the usual way, and include the lines:

# ... Slurm batch commands as usual ...\n\nmodule load forge\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nmap -n <number of MPI processes> --mpi=slurm --mpiargs=\"--hint=nomultithread --distribution=block:block\" --profile ./my_executable\n

Successful execution will generate a file with a .map extension.

This .map file may be viewed via the GUI (start with either map or forge) by selecting the Load a profile data file from a previous run option. The resulting file selection dialog box can then be used to locate the .map file.

"},{"location":"data-tools/forge/#connecting-with-the-remote-client","title":"Connecting with the remote client","text":"

If one starts the Forge client on e.g., a laptop, one should see the main window as shown above. Select Remote Launch and then Configure from the drop-down menu. In the Configure Remote Connections dialog box click Add. The following window should be displayed. Fill in the fields as shown. The Connection Name is just a tag for convenience (useful if a number of different accounts are in use). The Host Name should be as shown with the appropriate username. The Remote Installation Directory should be exactly as shown. The Remote Script is needed to execute additional environment commands on connection. A default script is provided in the location shown.

/work/y07/shared/utils/core/forge/latest/remote-init\n

Other settings can be as shown. Remember to click OK when done.

From the Remote Launch menu you should now see the new Connection Name. Select this, and enter the relevant ssh passphase and machine password to connect. A remote connection will allow you to debug, or view a profile, as discussed above.

If different commands are required on connection, a copy of the remote-init script can be placed in, e.g., ${HOME/home/work}/.forge and edited as necessary. The full path of the new script should then be specified in the remote launch settings dialog box. Note that the script changes the directory to the /work/ file system so that batch submissions via sbatch will not be rejected.

Finally, note that ssh may need to be configured so that it picks up the correct local public key file. This may be done, e.g., via the local .ssh/config configuration file.

"},{"location":"data-tools/forge/#useful-links","title":"Useful links","text":""},{"location":"data-tools/globus/","title":"Using Globus to transfer data to/from ARCHER2 filesystems","text":""},{"location":"data-tools/globus/#setting-up-archer2-filesystems","title":"Setting up ARCHER2 filesystems","text":"

Navigate to https://app.globus.org

Log in with your Globus identity (this could be a globusid.org or other identity)

In File Manager, use the search tool to search for \u201cArcher2 file systems\u201d. Select it.

In the transfer pane, you are told that Authentication/Consent is required. Click Continue.

Click on the ARCHER2 Safe (safe.epcc.ed.ac.uk) link

Select the correct User account (if you have more than one)

Click Accept

Now confirm your Globus credentials \u2013 click Continue

Click on the SAFE id you selected previously

Make sure the correct User account is selected and Accept again

Your ARCHER2 /home directory will be shown

You can switch to viewing e.g. your /work directory by editing the path, or using the \"up one folder\" and selecting folders to move down the tree, as required

"},{"location":"data-tools/globus/#setting-up-the-other-end-of-the-transfer","title":"Setting up the other end of the transfer","text":"

Make sure you select two-panel view mode

"},{"location":"data-tools/globus/#laptop","title":"Laptop","text":"

If you wish to transfer data to/from your personal laptop or other device, click on the Collection Search in the right-hand panel

Use the link to \u201cGet Globus Connect Personal\u201d to create a Collection for your local drive.

"},{"location":"data-tools/globus/#other-server-eg-jasmin","title":"Other server e.g. JASMIN","text":"

If you wish to connect to another server, you will need to search for the Collection e.g. JASMIN Default Collection and authenticate

Please see the JASMIN Globus page for more information

"},{"location":"data-tools/globus/#setting-up-and-initiating-the-transfer","title":"Setting up and initiating the transfer","text":"

Once you are connected to both the Source and Destination Collections, you can use the File Manager to select the files to be transferred, and then click the Start button to initiate the transfer

A pop-up will appear once the Transfer request has been submitted successfully

Clicking on the \u201cView Details\u201d will show the progress and final status of the transfer

"},{"location":"data-tools/globus/#using-a-different-archer2-account","title":"Using a different ARCHER2 account","text":"

If you want to use Globus with a different account on ARCHER2, you will have to go to Settings

Manage Identities

And Unlink the current ARCHER2 safe identity, then repeat the link process with the other ARCHER2 account

"},{"location":"data-tools/julia/","title":"Julia","text":"

Julia is a general purpose software used widely in datascience and for data visualisation.

Important

This documentation is provided by an external party (i.e. not by the ARCHER2 service itself). Julia is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"data-tools/julia/#first-time-installation","title":"First time installation","text":"

Note

There is no centrally installed version of Julia, so you will have to manually install it and any packages you may need. The following guide was tested on julia-1.6.6.

You will first need to download Julia into your work directory and untar the folder. You should then add the folder to your system path so you can use the julia executable. Finally, you need to tell Julia to install any packages in the work directory as opposed to the default home directory, which can only be accessed from the login nodes. This can be done with the following code

export WORK=/work/t01/t01/auser\ncd $WORK\n\nwget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.6-linux-x86_64.tar.gz\ntar zxvf julia-1.6.6-linux-x86_64.tar.gz\nrm ./julia-1.6.6-linux-x86_64.tar.gz\n\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"\n\nmkdir ./.julia\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport PATH=\"$PATH:$WORK/$JULIA_DEPOT_PATH/bin\"\n

At this point you should have a working installation of Julia! The environment variables will however be cleared when you log out of the terminal. You can set them in the .bashrc file so that they're automatically defined every time you log in by adding the following lines to the end of the file ~/.bashrc

export WORK=\"/work/t01/t01/auser\"\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"\nexport PATH=\"$PATH:$JULIA_DEPOT_PATH/bin\"\n
"},{"location":"data-tools/julia/#installing-packages-and-using-environments","title":"Installing packages and using environments","text":"

Julia has a built in package manager which can be used to install registered packages quickly and easily. Like with many other high level programming languages we can make use of environments to control dependencies etc.

To make an environment, first navigate to where you want your environment to be (ideally a subfolder of your /work/ directory) and create an empty folder to store the environment in. Then launch Julia with the --project flag.

cd $WORK\nmkdir ./MyTestEnv\njulia --project=$WORK/MyTestEnv\n

This launches Julia in the MyTestEnv environment. You can then install packages as usual using the normal commands in the Julia terminal. E.g.

using Pkg\nPkg.add(\"Oceananigans\")\n
"},{"location":"data-tools/julia/#configuring-mpijl","title":"Configuring MPI.jl","text":"

The MPI.jl package doesn't use the system MPICH implementation by default. You can set it up to do this by following the steps below. First you will need to load the cray-mpich module and define some environment variables (see here for further details). Then you can launch Julia in an environment of your choice, ready to build.

module load cray-mpich/8.1.23\nexport JULIA_MPI_BINARY=\"system\"\nexport JULIA_MPI_PATH=\"\"\nexport JULIA_MPI_LIBRARY=\"/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi.so\"\nexport JULIA_MPIEXEC=\"srun\"\n\njulia --project=<<path to environment>>\n

Once in the Julia terminal you can build the MPI.jl package using the following code. The final line installs the mpiexecjl command which should be used instead of srun to launch mpi processes.

using Pkg\nPkg.build(\"MPI\"; verbose=true)\nMPI.install_mpiexecjl(command = \"mpiexecjl\", force = false, verbose = true)\n
The mpiexecjl command will be installed in the directory that JULIA_DEPOT_PATH points too.

Note

You only need to do this once per environment.

"},{"location":"data-tools/julia/#running-julia-on-the-compute-nodes","title":"Running Julia on the compute nodes","text":"

Below is an example script for running Julia with mpi on the compute nodes

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=<<job-name>>\n#SBATCH --time=00:19:00\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=24\n#SBATCH --cpus-per-task=1\n\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n#SBATCH --account=<<your account>>\n#SBATCH --partition=standard\n\n# Setup the job environment (this module needs to be loaded before any other modules)\nmodule load PrgEnv-cray\nmodule load cray-mpich/8.1.23\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\nexport JULIA_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Define some paths\nexport WORK=/work/t01/t01/auser\n\nexport JULIA=\"$WORK/julia-1.6.6/bin/julia\"  # The julia executable\nexport PATH=\"$PATH:$WORK/julia-1.6.6/bin\"  # The folder of the julia executable\nexport JULIA_DEPOT_PATH=\"$WORK/.julia\"\nexport MPIEXECJL=\"$JULIA_DEPOT_PATH/bin/mpiexecjl\"  # The path to the mpiexexjl executable\n\n$MPIEXECJL --project=$WORK/MyTestEnv -n 24 $JULIA ./MyMpiJuliaScript.jl\n

The above script uses MPI but you can also use multithreading instead by setitng the JULIA_NUM_THREADS environment variable.

"},{"location":"data-tools/papi-mpi-lib/","title":"PAPI MPI Library","text":"

The Performance Application Programming Interface (PAPI) is an API that facilitates the reading of performance counter data without needing to know the details of the underlying hardware.

For convenience, we have developed an MPI-based wrapper for PAPI, called papi_mpi_lib, which can be found via the link below.

https://github.com/cresta-eu/papi_mpi_lib

The PAPI MPI Library makes it possible to monitor a user-defined set of hardware performance counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just four functions, and is intended to be straightforward to use. Once you've decided where in your code you wish to record counter values, you can control which counters are read at runtime by setting the PAT_RT_PERFCTR environment variable in the job submission script. As your code executes, the specified counters will be read at various points. After each reading, the counter values are summed by rank 0 (via an MPI reduction) before being output to a log file.

You can discover which counters are available on ARCHER2 compute nodes by submitting the following single node job.

#!/bin/bash --login\n\n#SBATCH -J papi\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=1\n#SBATCH --account=<budget code>\n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --export=none\n\nfunction papi_query() {\n  export LD_LIBRARY_PATH=/opt/cray/pe/papi/$2/lib64:/opt/cray/libfabric/$3/lib64\n  module -q restore\n\n  module -q load cpe/$1\n  module -q load papi/$2\n\n  mkdir -p $1\n  papi_component_avail -d &> $1/papi_component_avail.txt\n  papi_native_avail -c &> $1/papi_native_avail.txt\n  papi_avail -c -d &> $1/papi_avail.txt\n}\n\npapi_query 22.12 6.0.0.17 1.12.1.2.2.0.0\n

The job runs various papi commands with the output being directed to specific text files. Please consult the text files to see which counters are available. Note, counters that are not available may still be listed in the file, but with a label such as <NA>.

As of July 2023, the Cray Programming Environment (CPE), PAPI and libfabric versions on ARCHER2, were 22.12, 6.0.0.17 and 1.12.1.2.2.0.0 respectively; these versions may change in the future.

Alternatively, you can run pat_help counters rome from a login node to check the availability of individual counters.

Further information on papi_mpi_lib along with test harnesses and example scripts can be found by reading the PAPI MPI Library readme file.

"},{"location":"data-tools/paraview/","title":"ParaView","text":"

ParaView is a data visualisation and analysis package. Whilst ARCHER2 compute or login nodes do not have graphics cards installed in them, ParaView is installed so the visualisation libraries and applications can be used to post-process simulation data. The ParaView server (pvserver), batch application (pvbatch), and the Python interface (pvpython) are all available. Users are able to run the server on the compute nodes and connect to a local ParaView client running on their own computer.

"},{"location":"data-tools/paraview/#useful-links","title":"Useful links","text":""},{"location":"data-tools/paraview/#using-paraview-on-archer2","title":"Using ParaView on ARCHER2","text":"

ParaView is available through the paraview module.

module load paraview\n

Once the module has been added, the ParaView executables, tools, and libraries will be available.

"},{"location":"data-tools/paraview/#connecting-to-pvserver-on-archer2","title":"Connecting to pvserver on ARCHER2","text":"

For doing visualisation, you should connect to pvserver from a local ParaView client running on your own computer.

Note

You should make sure the version of ParaView you have installed locally is the same as the one on ARCHER2 (version 5.10.1).

The following instructions are for running pvserver in an interactive job. Start an iteractive job using:

srun --nodes=1 --exclusive --time=00:20:00 \\\n               --partition=standard --qos=short --pty /bin/bash\n

Once the job starts the command prompt will change to show you are now on the compute node, e.g.:

auser@nid001023:/work/t01/t01/auser> \n

Then load the ParaView module and start pvserver with the srun command,

auser@nid001023:/work/t01/t01/auser> module load paraview\nauser@nid001023:/work/t01/t01/auser> srun --overlap --oversubscribe -n 4 \\\n> pvserver --mpi --force-offscreen-rendering\nWaiting for client...\nConnection URL: cs://nid001023:11111\nAccepting connection(s): nid001023:11111\n

Note

The previous example uses 4 compute cores to run pvserver. You can increase the number of cores in case the visualisation does not run smoothly. Please bear in mind that, depending on the testcase, a large number of compute cores can lead to an out-of-memory runtime error.

In a separate terminal you can now set up an SSH tunnel with the node ID and port number which the pvserver is using, e.g.:

ssh -L 11111:nid001023:11111 auser@login.archer2.ac.uk \n

enter your password and passphrase as usual.

You can then connect from your local client using the following connection settings:

Name:           archer2 \nServer Type:    Client/Server \nHost:           localhost \nPort:           11111\n

Note

The Host from the local client should be set to \"localhost\" when using the SSH tunnel. The \"Name\" field can be set to a name of your choosing. 11111 is the default port for pvserver.

If it has connected correctly, you should see the following:

Waiting for client...\nConnection URL: cs://nid001023:11111\nAccepting connection(s): nid001023:11111\nClient connected.\n
"},{"location":"data-tools/paraview/#using-batch-mode-pvbatch","title":"Using batch-mode (pvbatch)","text":"

A pvbatch script can be run in a standard job script. For example the following will run on a single node:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=example_paraview_job\n#SBATCH --time=0:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]             \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load paraview\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread pvbatch pvbatchscript.py\n
"},{"location":"data-tools/paraview/#compiling-paraview","title":"Compiling ParaView","text":"

The latest instructions for building ParaView on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"data-tools/pm-mpi-lib/","title":"Power Management MPI Library","text":"

The ARCHER2 compute nodes each have a set of so-called Power Management (PM) counters. These cover point-in-time power readings for the whole node, and for the CPU and memory domains. The accumulated energy use is also recorded at the same level of detail. Further, there are two temperature counters, one for each socket/processor on the node. The counters are read ten times per second and the data written to a set of files stored within node memory (located at /sys/cray/pm_counters/).

For convenience, we have developed an MPI-based wrapper, called pm_mpi_lib that facilitates the reading of the PM counter files, see the link below.

https://github.com/cresta-eu/pm_mpi_lib

The PM MPI Library makes it possible to monitor the Power Management counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just three functions, and is intended to be straightforward to use. You simply decide which parts of your code you wish to profile as regards energy usage and/or power consumption.

As your code executes, the PM counters will be read at various points by a single designated monitor rank on each node assigned to the job. These readings are then written to a log file, which, after the job completes, will contain one set of time-stamped readings per node for every call to the pm_mpi_record function made from within your code. The readings can then be aggregated according to preference.

Further information along with test harnesses and example scripts can be found by reading the PM MPI Library readme file.

"},{"location":"data-tools/spack/","title":"Spack","text":"

Spack is a package manager, a tool to assist with building and installing software as well as determining what dependencies are required and installing those. It was originally designed for use on HPC clusters, where several variations of a given package may be installed alongside one another for different use cases -- for example different versions, built with different compilers, using MPI or hybrid MPI+OpenMP. Spack is principally written in Python but has a component written in Answer Set Programming (ASP) which is used to determine the required dependencies for a given package installation.

Users are welcome to install Spack themselves in their own directories, but we are making an experimental installation tailored for ARCHER2 available centrally. This page provides documentation on how to activate and install packages using the central installation on ARCHER2. For more in-depth information on using Spack itself please see the developers' documentation.

Important

As ARCHER2's central Spack installation is still in an experimental stage please be aware that we cannot guarantee that it will work with full functionality and we may not be able to provide support.

"},{"location":"data-tools/spack/#activating-spack","title":"Activating Spack","text":"

As it is still in an experimental stage, the Spack module is not made available to users by default. You must firstly load the other-software module:

auser@ln01:~> module load other-software\n

Several modules with spack in their name will become visible to you. You should load the spack module:

auser@ln01:~> module load spack\n

This configures Spack to place its cache on and install software to a directory called .spack in your base work directory, e.g. at /work/t01/t01/auser/.spack.

At this point Spack is available to you via the spack command. You can get started with spack help, reading the Spack documentation, or by testing a package's installation.

"},{"location":"data-tools/spack/#using-spack-on-archer2","title":"Using Spack on ARCHER2","text":""},{"location":"data-tools/spack/#installing-software","title":"Installing software","text":"

At its simplest, Spack installs software with the spack install command:

auser@ln01:~> spack install gromacs\n

This very simple gromacs installation specification, or spec, would install GROMACS using the default options given by the Spack gromacs package. The spec can be expanded to include which options you like. For example, the command

auser@ln01:~> spack install gromacs@2024.2%gcc+mpi\n

would use the GCC compiler to install an MPI-enabled version of GROMACS version 2024.2.

Tip

Spack needs to bootstrap the installation of some extra software in order to function, principally clingo which is used to solve the dependencies required for an installation. The first time you ask Spack to concretise a spec into a precise set of requirements, it will take extra time as it downloads this software and extracts it into a local directory for Spack's use.

You can find information about any Spack package and the options available to use with the spack info command:

auser@ln01:~> spack info gromacs\n

Tip

The Spack developers also provide a website at https://packages.spack.io/ where you can search for and examine packages, including all information on options, versions and dependencies.

When installing a package, Spack will determine what dependencies are required to support it. If they are not already available to Spack, either as packages that it has installed beforehand or as external dependencies, then Spack will also install those, marking them as implicity installed, as opposed to the explicit installation of the package you requested. If you want to see the dependencies of a package before you install it, you can use spack spec to see the full concretised set of packages:

auser@ln01:~> spack spec gromacs@2024.2%gcc+mpi\n

Tip

Spack on ARCHER2 has been configured to use as much of the HPE Cray Programming Environment as possible. For example, this means that Cray LibSci will be used to provide the BLAS, LAPACK and ScaLAPACK dependencies and Cray MPICH will provide MPI. It is also configured to allow it to re-use as dependencies any packages that the ARCHER2 CSE team has spack installed centrally, potentially helping to save you build time and storage quota.

"},{"location":"data-tools/spack/#using-spack-packages","title":"Using Spack packages","text":"

Spack provides a module-like way of making software that you have installed available to use. If you have a GROMACS installation, you can make it available to use with spack load:

auser@ln01:~> spack load gromacs\n

At this point you should be able to use the software as normal. You can then remove it once again from the environment with spack unload:

auser@ln01:~> spack unload gromacs\n

If you have multiple variants of the same package installed, you can use the spec to distinguish between them. You can always check what packages have been installed using the spack find command. If no other arguments are given it will simply list all installed packages, or you can give a package name to narrow it down:

auser@ln01:~> spack find gromacs\n

You can see your packages' install locations using spack find --paths or spack find -p.

"},{"location":"data-tools/spack/#maintaining-your-spack-installations","title":"Maintaining your Spack installations","text":"

In any Spack command that requires as an argument a reference to an installed package, you can provide a hash reference to it rather than its spec. You can see the first part of the hash by running spack find -l, or the full hash with spack find -L. Then use the hash in a command by prefixing it with a forward slash, e.g. wjy5dus becomes /wjy5dus.

If you have two packages installed which appear identical in spack find apart from their hash, you can differentiate them with spack diff:

auser@ln01:~> spack diff /wjy5dus /bleelvs\n

You can uninstall your packages with spack uninstall:

auser@ln01:~> spack uninstall gromacs@2024.2\n

and of course, to be absolutely certain that you are uninstalling the correct package, you can provide the hash:

auser@ln01:~> spack uninstall /wjy5dus\n

Uninstalling a package will leave behind any implicitly installed packages that were installed to support it. Spack may have also installed build-time dependencies that aren't actually needed any more -- these are often packages like autoconf, cmake and m4. You can run the garbage collection command to uninstall any build dependencies and implicit dependencies that are no longer required:

auser@ln01:~> spack gc\n

If you commonly use a set of Spack packages together you may want to consider using a Spack environment to assist you in their installation and management. Please see the Spack documentation for more information.

"},{"location":"data-tools/spack/#custom-configuration","title":"Custom configuration","text":"

Spack is configured using YAML files. The central installation on ARCHER2 made available to users is configured to use the HPE Cray Programming Environment and to allow you to start installing software to your /work directories right away, but if you wish to make any changes you can provide your own overriding userspace configuration.

Your own configuration should fit in the user level scope. On ARCHER2 Spack is configured to, by default, place and look for your configuration files in your work directory at e.g. /work/t01/t01/auser/.spack. You can however override this to have Spack use any directory you choose by setting the SPACK_USER_CONFIG_PATH environment variable, for example:

auser@ln01:~> export SPACK_USER_CONFIG_PATH=/work/t01/t01/auser/spack-config\n

Of course this will need to be a directory where you have write permissions, such in your home or work directories, or in one of your project's shared directories.

You can edit the configuration files directly in a text editor or by running, for example:

auser@ln01:~> spack config edit repos\n

which would open your repos.yaml in vim.

Tip

If you would rather not use vim, you can change which editor is used by Spack by setting the SPACK_EDITOR environment variable.

The final configuration used by Spack is a compound of several scopes, from the Spack defaults which are overridden by the ARCHER2 system configuration files, which can then be overridden in turn by your own configurations. You can see what options are in use at any point by running, for example:

auser@ln01:~> spack config get config\n

which goes through any and all config.yaml files known to Spack and sets the options according to those files' level of precedence. You can also get more information on which files are responsible for which lines in the final active configuration by running, for example to check packages.yaml:

auser@ln01:~> spack config blame packages\n

Unless you have already written a packages.yaml of your own, this will show a mix of options originating from the Spack defaults and also from an archer2-user directory which is where we have told Spack how to use packages from the HPE Cray Programming Environment.

If there is some behaviour in Spack that you want to change, looking at the output of spack config get and spack config blame may help to show what you would need to do. You can then write your own user scope configuration file to set the behaviour you want, which will override the option as set by the lower-level scopes.

Please see the Spack documentation to find out more about writing configuration files.

"},{"location":"data-tools/spack/#writing-new-packages","title":"Writing new packages","text":"

A Spack package is at its core a Python package.py file which provides instructions to Spack on how to obtain source code and compile it. A very simple package will allow it to build just one version with one compiler and one set of options. A more fully-featured package will list more versions and include logic to build them with different compilers and different options, and to also pick its dependencies correctly according to what is chosen.

Spack provides several thousand packages in its builtin repository. You may be able to use these with no issues on ARCHER2 by simply running spack install as described above, but if you do run into problems in the interaction between Spack and the CPE compilers and libraries then you may wish to write your own. Where the ARCHER2 CSE service has encountered problems with packages we have provided our own in a repository located at $SPACK_ROOT/var/spack/repos/archer2.

"},{"location":"data-tools/spack/#creating-your-own-package-repository","title":"Creating your own package repository","text":"

A package repository is a directory containing a repo.yaml configuration file and another directory called packages. Directories within the latter are named for the package they provide, for example cp2k, and contain in turn a package.py. You can create a repository from scratch with the command

auser@ln01:~> spack repo create dirname\n

where dirname is the name of the directory holding the repository. This command will create the directory in your current working directory, but you can choose to instead provide a path to its location. You can then make the new repository available to Spack by running:

auser@ln01:~> spack repo add dirname\n

This adds the path to dirname to the repos.yaml file in your user scope configuration directory as described above. If your repos.yaml doesn't yet exist, it will be created.

A Spack repository can similarly be removed from the config using:

auser@ln01:~> spack repo rm dirname\n
"},{"location":"data-tools/spack/#namespaces-and-repository-priority","title":"Namespaces and repository priority","text":"

A package can exist in several repositories. For example, the Quantum Espresso package is provided by both the builtin repository provided with Spack and also by the archer2 repository; the latter has been patched to work on ARCHER2.

To distinguish between these packages, each repository's packages exist within that repository's namespace. By default the namespace is the same as the name of the directory it was created in, but Spack does allow it to be different. Both builtin and archer2 use the same directory name and namespace.

Tip

If you want your repository namespace to be different from the name of the directory, you can change it either by editing the repository's repo.yaml or by providing an extra argument to spack repo create:

auser@ln01:~> spack repo create dirname namespace\n

Running spack find -N will return the list of installed packages with their namespace. You'll see that they are then prefixed with the repository namespace, for example builtin.bison@3.8.2 and archer2.quantum-espresso@7.2. In order to avoid ambiguity when managing package installation you can always prefix a spec with a repository namespace.

If you don't include the repository in a spec, Spack will search in order all the repositories it has been configured to use until it finds a matching package, which it will then use. The earlier in the list of repositories, the higher the priority. You can check this with:

auser@ln01:~> spack repo list\n

If you run this without having added any repositories of your own, you will see that the two available repositories are archer2 and builtin, in this order. This means that archer2 has higher priority. Because of this, running spack install quantum-espresso would install archer2.quantum-espresso, but you could still choose to install from the other repository with spack install builtin.quantum-espresso.

"},{"location":"data-tools/spack/#creating-a-package","title":"Creating a package","text":"

Once you have a repository of your own in place, you can create new packages to store within it. Spack has a spack create command which will do the initial setup and create a boilerplate package.py. To create an empty package called packagename you would run:

auser@ln01:~> spack create --name packagename\n

However, it will very often be more efficient if you instead provide a download URL for your software as the argument. For example, the Code_Saturne 8.0.3 source is obtained from https://www.code-saturne.org/releases/code_saturne-8.0.3.tar.gz, so you can run:

auser@ln01:~> spack create https://www.code-saturne.org/releases/code_saturne-8.0.3.tar.gz\n

Spack will determine from this the package name, the download URLs for all versions X.Y.Z matching the https://www.code-saturne.org/releases/code_saturne-X.Y.Z.tar.gz pattern. It will then ask you interactively which of these you want to use. Finally, it will download the .tar.gz archives for those versions and calculate their checksums, then place all this information in the initial version of the package for you. This takes away a lot of the initial work!

At this point you can get to work on the package. You can edit an existing package by running

auser@ln01:~> spack edit packagename\n

or by directly opening packagename/package.py within the repository with a text editor.

The boilerplate code will note several sections for you to fill out. If you did provide a source code download URL, you'll also see listed the versions you chose and their checksums.

A package is implemented as a Python class. You'll see that by default it will inherit from the AutotoolsPackage class which defines how a package following the common configure > make > make install process should be built. You can change this to another build system, for example CMakePackage. If you want, you can have the class inherit from several different types of build system classes and choose between them at install time.

Options must be provided to the build. For an AutotoolsPackage package, you can write a configure_args method which very simply returns a list of the command line arguments you would give to configure if you were building the code yourself. There is an identical cmake_args method for CMakePackage packages.

Finally, you will need to provide your package's dependencies. In the main body of your package class you should add calls to the depends_on() function. For example, if your package needs MPI, add depends_on(\"mpi\"). As the argument to the function is a full Spack spec, you can provide any necessary versioning or options, so, for example, if you need PETSc 3.18.0 or newer with Fortran support, you can call depends_on(\"petsc+fortran@3.18.0:\").

If you know that you will only ever want to build a package one way, then providing the build options and dependencies should be all that you need to do. However, if you want to allow for different options as part of the install spec, patch the source code or perform post-install fixes, or take more manual control of the build process, it can become much more complex. Thankfully the Spack developers have provided excellent documentation covering the whole process, and there are many existing packages you can look at to see how it's done.

"},{"location":"data-tools/spack/#tips-when-writing-packages-for-archer2","title":"Tips when writing packages for ARCHER2","text":"

Here are some useful pointers when writing packages for use with the HPE Cray Programming Environment on ARCHER2.

"},{"location":"data-tools/spack/#cray-compiler-wrappers","title":"Cray compiler wrappers","text":"

An important point of note is that Spack does not use the Cray compiler wrappers cc, CC and ftn when compiling code. Instead, it uses the underlying compilers themselves. Remember that the wrappers automate the use of Cray LibSci, Cray FFTW, Cray HDF5 and Cray NetCDF. Without this being done for you, you may need to take extra care to ensure that the options needed to use those libraries are correctly set.

"},{"location":"data-tools/spack/#using-cray-libsci","title":"Using Cray LibSci","text":"

Cray LibSci provides optimised implementations of BLAS, BLACS, LAPACK and ScaLAPACK on ARCHER2. These are bundled together into single libraries named for variants on libsci_cray.so. Although Spack itself knows about LibSci, many applications don't and it can sometimes be tricky to get them to use these libraries when they are instead looking for libblas.so and the like.

The configure or cmake or equivalent step for your software will hopefully allow you to manually point it to the correct library. For example, Code_Saturne's configure can take the options --with-blas-lib and --with-blas-libs which respectively tell it the location to search and the libraries to use in order to build against BLAS.

Spack can provide the correct BLAS library search and link flags to be passed on to configure via self.spec[\"blas\"].libs, a LibraryList object. So, the Code_Saturne package uses the following configure_args() method:

def configure_args(self):\n    blas = self.spec[\"blas\"].libs\n    args = [\"--with-blas-lib={0}\".format(blas.search_flags),\n            \"--with-blas-libs={0}\".format(blas.link_flags)]\n    return args\n

Here the blas.search_flags attribute is resolved to a -L library search flag using the path to the correct LibSci directory, taking into account whether the libraries for the Cray, GCC or AOCC compilers should be used. blas.link_flags similarly gives a -l flag for the correct LibSci library. Depending on what you need, the LibraryList has other attributes which can help you pass the options needed to get configure to find and use the correct library.

"},{"location":"data-tools/spack/#contributing","title":"Contributing","text":"

If you develop a package for use on ARCHER2 please do consider opening a pull request to the GitHub repository.

"},{"location":"data-tools/visidata/","title":"VisiData","text":"

VisiData is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

"},{"location":"data-tools/visidata/#useful-links","title":"Useful links","text":""},{"location":"data-tools/visidata/#visidata-on-archer2","title":"VisiData on ARCHER2","text":"

You can access VisiData on ARCHER2 by loading the visidata module:

module load visidata\n

Once the module has been loaded, VisiData is available via the vd command.

Visidata can also be used in scripts by saving a command log and replaying it. See the VisiData documentation on saving and restoring VisiData sessions.

"},{"location":"data-tools/vmd/","title":"VMD","text":"

VMD is a visualisation program for displaying, animating, and analysing molecular systems using 3D graphics, and built-In tcl/tk scripting.

"},{"location":"data-tools/vmd/#useful-links","title":"Useful links","text":""},{"location":"data-tools/vmd/#using-vmd-on-archer2","title":"Using VMD on ARCHER2","text":"

VMD is available through the vmd module.

module load vmd\n

Once the module has been added the VMD executables, tools, and libraries will be made available.

Without anything else, this allows you to run VMD in \"text-only\" mode with:

vmd -dispdev text\n

If you want to launch VMD with a GUI, see the requirements on the next section.

"},{"location":"data-tools/vmd/#launching-vmd-with-a-gui","title":"Launching VMD with a GUI","text":"

To be able to launch VMD with it's graphical interface, your machine needs to support the x11 \"X windows system\". Most Linux and *NIX systems support this by default. If you're using Windows (through WSL, for example), you will need an X11 display server, we recommend XMing. For macOS, we recommend XQuartz, but please be aware that there's some extra configuration needed, please see the next section

To launch VMD with a GUI, once you have a running X11 display server on your local machine, you'll need to connect to ARCHER2 with X11 forwarding enabled, please follow the instructions in the logging in section. Once you're connected to ARCHER2, load the VMD module with:

module load vmd\n

and launch VMD with:

vmd\n
"},{"location":"data-tools/vmd/#using-vmd-from-macos","title":"Using VMD from macOS","text":"

If you're using macOS and XQuartz, before you're able to launch VMD with a GUI, you will need to change the XQuartz configuration. On a local terminal (that is, not connected to ARCHER2), run the following command:

defaults write org.xquartz.X11 enable_iglx -bool true\n

then, restart XQuartz. You will now be able to launch VMD's GUI without a segmentation fault.

"},{"location":"data-tools/vmd/#compiling-vmd","title":"Compiling VMD","text":"

The latest instructions for building VMD on ARCHER2 may be found in the GitHub repository of build instructions.

"},{"location":"essentials/","title":"Essential Skills","text":"

This section provides information and links on essential skills required to use ARCHER2 efficiently: e.g. using Linux command line, accessing help and documentation.

"},{"location":"essentials/#terminal","title":"Terminal","text":"

In order to access HPC machines such as ARCHER2 you will need to use a Linux command line terminal window

Options for Linux, MacOS and Windows are described under our Connecting to ARCHER2 guide

"},{"location":"essentials/#linux-command-line","title":"Linux Command Line","text":"

A guide to using the Unix Shell for complete novices

For those already familiar with the basics there is also a lesson on shell extras

"},{"location":"essentials/#basic-slurm-commands","title":"Basic Slurm commands","text":"

Slurm is the scheduler used on ARCHER2 and we provide a guide to using the basic Slurm commands including how to find out:

"},{"location":"essentials/#text-editors","title":"Text Editors","text":"

The following text editors are available on ARCHER2

Name Description Examples emacs A widely used editor with a focus on extensibility. emacs -nw sharpen.pbs CTRL+X CTRL+C quits CTRL+X CTRL+S saves nano A small, free editor with a focus on user friendliness. nano sharpen.pbs CTRL+X quits CTRL+O saves vi A mode based editor with a focus on aiding code development. vi cfd.f90 :q in command mode quits :q! in command mode quits without saving :w in command mode saves i in command mode switches to insert mode ESC in insert mode switches to command mode

If you are using MobaXterm on Windows you can use the inbuilt MobaTextEditor text file editor.

You can edit on your local machine using your preferred text editor, and then upload the file to ARCHER2. Make sure you can save the file using Linux line-endings. Notepad, for example, will support Unix/Linux line endings (LF), Macintosh line endings (CR), and Windows Line endings (CRLF)

"},{"location":"essentials/#quick-reference-sheet","title":"Quick Reference Sheet","text":"

We have produced this Quick Reference Sheet which you may find useful.

"},{"location":"faq/","title":"ARCHER2 Frequently Asked Questions","text":"

This section documents some of the questions raised to the Service Desk on ARCHER2, and the advice and solutions.

"},{"location":"faq/#user-accounts","title":"User accounts","text":""},{"location":"faq/#username-already-in-use","title":"Username already in use","text":"

Q. I created a machine account on ARCHER2 for a training course, but now I want to use that machine username for my main ARCHER2 project, and the system will not let me, saying \"that name is already in use\". How can I re-use that username.

A. Send an email to the service desk, letting us know the username and project that you set up previously, and asking for that account and any associated data to be deleted. Once deleted, you can then re-use that username to request an account in your main ARCHER2 project.

"},{"location":"faq/#data","title":"Data","text":""},{"location":"faq/#undeleteable-file-nfsxxxxxxxxxxx","title":"Undeleteable file .nfsXXXXXXXXXXX","text":"

Q. I have a file called .nfsXXXXXXXXXXX (where XXXXXXXXXXX is a long hexadecimal string) in my /home folder but I can't delete it.

A. This file will have been created during a file copy which failed. Trying to delete it will give an error \"Device or resource busy\", even though the copy has ended and no active task is locking it.

echo -n >.nfsXXXXXXXXXXX

will remove it.

"},{"location":"faq/#running-on-archer2","title":"Running on ARCHER2","text":""},{"location":"faq/#oom-error-on-archer2","title":"OOM error on ARCHER2","text":"

Q. Why is my code failing on ARCHER2 with an out of memory (OOM) error?

A. You are requesting too much memory per process. We recommend that you try running the same job on underpopulated nodes. This can be done by editing reducing the --ntasks-per-node in your Slurm submission script. Please lower it to half of its value when it fails (so if you have --ntasks-per-node=128, reduce it to --ntasks-per-node=64).

"},{"location":"faq/#checking-budgets","title":"Checking budgets","text":"

Q. How can I check which budget code(s) I can use?

A. You can check in SAFE by selecting Login accounts from the menu, select the login account you want to query.

Under Login account details you will see each of the budget codes you have access to listed e.g. e123 resources and then under Resource Pool to the right of this, a note of the remaining budget.

When logged in to the machine you can also use the command

sacctmgr show assoc where user=$LOGNAME format=user,Account%12,MaxTRESMins,QOS%40\n

This will list all the budget codes that you have access to (but not the amount of budget available) e.g.

    User      Account  MaxTRESMins                                 QOS\n-------- ------------ ------------ -----------------------------------\n   userx    e123-test                   largescale,long,short,standard\n   userx         e123        cpu=0      largescale,long,short,standard\n

This shows that userx is a member of budgets e123-test and e123. However, the cpu=0 indicates that the e123 budget is empty or disabled. This user can submit jobs using the e123-test budget.

You can only check the amount of available budget via SAFE - see above.

"},{"location":"faq/#estimated-start-time-of-queued-jobs","title":"Estimated start time of queued jobs","text":"

Q. I\u2019ve checked the estimated start time for my queued jobs using \u201csqueue -u $USER --start\u201d. Why does the estimated start time keep changing?

A. ARCHER2 uses the Slurm scheduler to queue jobs for the compute nodes. Slurm attempts to find a better schedule as jobs complete and new jobs are added to the queue. This helps to maximise the use of resources by minimising the number of idle compute nodes, in turn reducing your wait time in the queue.

However, If you periodically check the estimated start time of your queued jobs, you may notice that the estimate changes or even disappears. This is because Slurm only assigns the top entries in the queue with an estimated start time. As the schedule changes, your jobs could move in and out of this top region and thus gain or lose an estimated start time.

"},{"location":"faq/network-upgrade-2023/","title":"ARCHER2 data centre network upgrade: 2023","text":"

During September 2023 the data centre that houses ARCHER2 will be undergoing a major network upgrade.

On this page we describe the impact this will have and links to further information.

If you have any questions or concerns, please contact the ARCHER2 Service Desk.

"},{"location":"faq/network-upgrade-2023/#when-will-the-upgrade-happen-and-how-long-will-it-take","title":"When will the upgrade happen and how long will it take?","text":""},{"location":"faq/network-upgrade-2023/#the-outage-dates-will-be","title":"The outage dates will be:","text":"

We will notify users if we are able to complete this work ahead of schedule and restore ARCHER2 access earlier than expected.

"},{"location":"faq/network-upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":""},{"location":"faq/network-upgrade-2023/#during-the-upgrade-process","title":"During the upgrade process","text":""},{"location":"faq/network-upgrade-2023/#submitting-new-work-and-running-work","title":"Submitting new work, and running work","text":"

We will therefore be encouraging users to submit jobs in the period prior to the work, so that your work can continue on the system during the upgrade process.

"},{"location":"faq/network-upgrade-2023/#relaxing-of-queue-limits","title":"Relaxing of queue limits","text":"

In preparation for the Data Centre Network (DCN) upgrade we have relaxed the queue limits on all the QoS\u2019s, so that users can submit a significantly larger number of jobs to ARCHER2. These changes are intended to allow users to submit jobs that they wish to run during the upgrade, in advance of the start of the upgrade. The changes will be in place until the end of the Data Centre Network upgrade.

For the low priority QoS, as well as relaxing the number of jobs you can submit, we have also increased the maximum job length to 48 hours and the maximum number of nodes per job to 5,860, so users can submit using their own allocation or using the low-priority QoS.

QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 1024 24 hrs 320 16 standard Maximum of 1024 nodes in use by any one user at any time highmem 256 24 hrs 80 16 highmem Maximum of 512 nodes in use by any one user at any time taskfarm 16 24 hrs 640 32 standard Maximum of 256 nodes in use by any one user at any time short 32 20 mins 80 4 standard long 64 48 hrs 80 16 standard Minimum walltime of 24 hrs, maximum 512 nodes in use by any one user at any time, maximum of 2048 nodes in use by QoS largescale 5860 12 hrs 160 1 standard Minimum job size of 1025 nodes lowpriority 5,860 48 hrs 320 16 standard Jobs not charged but requires at least 1 CU in budget to use. serial disabled - - - - - reservation not available - - - -

Can we encourage users to make use of these changes, this is a good opportunity for users to queue and run a greater number of jobs than usual. The relaxation of limits on the low-priority queue also offers an opportunity to run a wider range of jobs through this queue than is normally possible.

Due to the unavailability of the DCN, users will not be able to connect to ARCHER2 via the login nodes during the upgrade. The serial QoS will be disabled during the upgrade period. However, serial jobs can be submitted using the standard and low-priority queues.

"},{"location":"faq/upgrade-2023/","title":"ARCHER2 Upgrade: 2023","text":"

During the first half of 2023 ARCHER went through a major software upgrade.

On this page we describe the background to the changes what impact the changes have had for users, any action you should expect to take following the upgrade and information on the versions on updated software.

If you have any questions or concerns, please contact the ARCHER2 Service Desk.

"},{"location":"faq/upgrade-2023/#why-did-the-upgrade-happen","title":"Why did the upgrade happen?","text":"

There are a number of reasons why ARCHER2 needed to go through this major software upgrade. All of these reasons are related to the fact that the previous system software setup was out of date; due to this, maintenance of the service was very difficult and updating software within the current framework was not possible. Some specific issues were:

"},{"location":"faq/upgrade-2023/#when-did-the-upgrade-happen-and-how-long-did-it-take","title":"When did the upgrade happen and how long did it take?","text":"

This major software upgrade involved a complete re-install of system software followed by a reinstatement of local configurations (e.g. Slurm, authentication services, SAFE integration). Unfortunately, this major work required a long period of downtime but this was planned with all service partners to minimise the outage and give as much notice to users as possible so that they could plan accordingly.

The outage dates were:

"},{"location":"faq/upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":""},{"location":"faq/upgrade-2023/#during-the-upgrade-process","title":"During the upgrade process","text":""},{"location":"faq/upgrade-2023/#after-the-upgrade-process","title":"After the upgrade process","text":"

The allocation periods (where appropriate) were extended for the outage period. The changes were in place when the service was returned.

After the upgrade process there are a number of changes that may require action from users

"},{"location":"faq/upgrade-2023/#updated-login-node-host-keys","title":"Updated login node host keys","text":"

If you previously logged into the ARCHER2 system before the upgrade you may see an error from SSH that looks like:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11).

The current login node host keys are always documented in the User Guide

"},{"location":"faq/upgrade-2023/#recompile-and-test-software","title":"Recompile and test software","text":"

As the new system is based on a new OS version and new versions of compilers and libraries we strongly recommend that all users recompile and test all software on the service. The ARCHER2 CSE service recompiled all centrally installed software.

"},{"location":"faq/upgrade-2023/#no-python-2-installation","title":"No Python 2 installation","text":"

There is no Python 2 installation available as part of supported software following the upgrade. Python 3 continues to be fully-supported.

"},{"location":"faq/upgrade-2023/#impact-on-data-on-the-service","title":"Impact on data on the service","text":""},{"location":"faq/upgrade-2023/#slurm-cpus-per-task-setting-no-longer-inherited-by-srun","title":"Slurm: cpus-per-task setting no longer inherited by srun","text":"

Change in Slurm behaviour. The setting from the --cpus-per-task option to sbatch/salloc is no longer propagated by default to srun commands in the job script.

This can lead to very poor performance due to oversubscription of cores with processes/threads if job submission scripts are not updated. The simplest workaround is to add the command:

export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n

before any srun commands in the script. You can also explicitly use the --cpus-per-task option to srun if you prefer.

"},{"location":"faq/upgrade-2023/#change-of-slurm-socket-definition","title":"Change of Slurm \"socket\" definition","text":"

This change only affects users who use a placement scheme where placement of processes on sockets is cyclic (e.g. --distribution=block:cyclic). The Slurm definition of a \u201csocket\u201d has changed. The previous setting on ARCHER2 was that a socket = 16 cores (all share a DRAM memory controller). On the updated ARCHER2, the setting of a socket = 4 cores (corresponding to a CCX - Core CompleX). Each CCX shares 16 MB L3 Cache.

"},{"location":"faq/upgrade-2023/#changes-to-bind-paths-and-library-paths-for-singularity-with-mpi","title":"Changes to bind paths and library paths for Singularity with MPI","text":"

The paths you need to bind and the LD_LIBRARY_PATH settings required to use Cray MPICH with MPI in Singularity containers have changed. The updated settings are documented in the Containers section of the User and Best Practice Guide. This also includes updated information on building containers with MPI to use on ARCHER2.

"},{"location":"faq/upgrade-2023/#amd-prof-not-available","title":"AMD \u03bcProf not available","text":"

The AMD \u03bcProf tool is not available on the upgraded system yet. We are working to get this fixed as soon as possible.

"},{"location":"faq/upgrade-2023/#what-software-versions-will-be-available-after-the-upgrade","title":"What software versions will be available after the upgrade?","text":"

System software:

"},{"location":"faq/upgrade-2023/#programming-environment-2212","title":"Programming environment: 22.12","text":"

Compilers:

Communication libraries:

Numerical libraries:

IO Libraries:

Tools:

"},{"location":"faq/upgrade-2023/#summary-of-user-and-application-impact-of-pe-software","title":"Summary of user and application impact of PE software","text":"

For full information, see CPE 22.12 Release Notes

CCE 15

C++ applications built using CCE 13 or earlier should be recompiled due to the significant changes that were necessary to implement C++17. This is expected to be a one-time requirement.

Some non-standard Cray Fortran extensions supporting shorthand notation for logical operations will be removed in a future release. CCE 15 will issue warning messages when these are encountered, providing time to adapt the application to use standard Fortran.

HPE Cray MPICH 8.1.23

Cray MPICH 8.1.23 can support only ~2040 simultaneous MPI communicators.

"},{"location":"faq/upgrade-2023/#cse-supported-software","title":"CSE supported software","text":"

Default version in italics

Software Versions CASTEP 22.11, 23.11 Code_Saturne 7.0.1 ChemShell/PyChemShell 3.7.1/21.0.3 CP2K 2023.1 FHI-aims 221103 GROMACS 2022.4 LAMMPS 17_FEB_2023 NAMD 2.14 Nektar++ 5.2.0 NWChem 7.0.2 ONETEP 6.9.1.0 OpenFOAM v10.20230119 (.org), v2212 (.com) Quantum Espresso 6.8, 7.1 VASP 5.4.4.pl2, 6.3.2, 6.4.1-vtst, 6.4.1 Software Versions AOCL 3.1, 4.0 Boost 1.81.0 GSL 2.7 HYPRE 2.18.0, 2.25.0 METIS/ParMETIS 5.1.0/4.0.3 MUMPS 5.3.5, 5.5.1 PETSc 13.14.2, 13.18.5 PT/Scotch 6.1.0, 07.0.3 SLEPC 13.14.1, 13.18.3 SuperLU/SuperLU_Dist 5.2.2 / 6.4.0, 8.1.2 Trilinos 12.18.1"},{"location":"known-issues/","title":"ARCHER2 Known Issues","text":"

This section highlights known issues on ARCHER2, their potential impacts and any known workarounds. Many of these issues are under active investigation by HPE Cray and the wider service.

Info

This page was last reviewed on 9 November 2023

"},{"location":"known-issues/#open-issues","title":"Open Issues","text":""},{"location":"known-issues/#atp-module-tries-to-write-to-home-from-compute-nodes-added-2024-04-29","title":"ATP Module tries to write to /home from compute nodes (Added: 2024-04-29)","text":"

The ATP Module tries to execute a mkdir command in the /home filesystem. When running the ATP module on the compute nodes, this will lead to an error, as the compute nodes cannot access the /home filesystem.

To circumvent the error, add the line:

export HOME=${HOME/home/work}\n

in the slurm script, so that the ATP module will write to /work instead.

"},{"location":"known-issues/#when-close-to-storage-quota-jobs-may-slow-down-or-produce-corrupted-files-added-2024-02-27","title":"When close to storage quota, jobs may slow down or produce corrupted files (Added: 2024-02-27)","text":"

For situations where users are close to user or project quotas on work (Lustre) file systems we have seen cases of the following behaviour:

If you see these symptoms: slower than expected performance, data corruption; then you should check if you are close to your storage quota (either user or project quota). If you are, you may be experiencing this issue. Either remove data to free up space or request more storage quota.

"},{"location":"known-issues/#e-mail-alerts-from-slurm-do-not-work-added-2023-11-09","title":"e-mail alerts from Slurm do not work (Added: 2023-11-09)","text":"

Email alerts from Slurm (--mail-type and --mail-user options) do not produce emails to users. We are investigating with Universtiy of Edinburgh Information Services to enable this Slurm feature in the future.

"},{"location":"known-issues/#excessive-memory-use-when-using-ucx-communications-protocol-added-2023-07-20","title":"Excessive memory use when using UCX communications protocol (Added: 2023-07-20)","text":"

We have seen cases when using the (non-default) UCX communications protocol where the peak in memory use is much higher than would be expected. This leads to jobs failing unexpectedly with an OOM (Out Of Memory) error. The workaround is to use Open Fabrics (OFI) communication protocol instead. OFI is the default protocol on ARCHER2 and so does not usually need to be explicitly loaded; but if you have UCX loaded, you can switch to OFI by adding the following lines to your submission script before you run your application:

module load craype-network-ofi\nmodule load cray-mpich\n

It can be very useful to track the memory usage of your job as it runs, for example to see whether there is high usage on all nodes, or a single node, if usage increases gradually or rapidly etc.

Here are instructions on how to do this using a couple of small scripts.

"},{"location":"known-issues/#slurm-cpu-freqx-option-is-not-respected-when-used-with-sbatch-added-2023-01-18","title":"Slurm --cpu-freq=X option is not respected when used with sbatch (Added: 2023-01-18)","text":"

If you specify the CPU frequency using the --cpu-freq option with the sbatch command (either using the script #SBATCH --cpu-freq=X method or the --cpu-freq=X option directly) then this option will not be respected as the default setting for ARCHER2 (2.0 GHz) will override the option. You should specify the --cpu-freq option to srun directly instead within the job submission script. i.e.:

srun --cpu-freq=2250000 ...\n

You can find more information on setting the CPU frequency in the User Guide.

"},{"location":"known-issues/#research-software","title":"Research Software","text":"

There are several outstanding issues for the centrally installed Research Software:

Users should also check individual software pages, for known limitations/ caveats, for the use of software on the Cray EX platform and Cray Linux Environment.

"},{"location":"known-issues/#issues-with-rpath-for-non-default-library-versions","title":"Issues with RPATH for non-default library versions","text":"

When you compile applications against non-default versions of libraries within the HPE Cray software stack and use the environment variable CRAY_ADD_RPATH=yes to try and encode the paths to these libraries within the binary this will not be respected at runtime and the binaries will use the default versions instead.

The workaround for this issue is to ensure that you set:

export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

at both compile and runtime. For more details on using non-default versions of libraries, see the description in the User and Best Practice Guide

"},{"location":"known-issues/#mpi-ucx-error-ivb_reg_mr","title":"MPI UCX ERROR: ivb_reg_mr","text":"

If you are using the UCX layer for MPI communication you may see an error such as:

[1613401128.440695] [nid001192:11838:0] ib_md.c:325 UCX ERROR ibv_reg_mr(address=0xabcf12c0, length=26400, access=0xf) failed: Cannot allocate memory\n[1613401128.440768] [nid001192:11838:0] ucp_mm.c:137 UCX ERROR failed to register address 0xabcf12c0 mem_type bit 0x1 length 26400 on md[4]=mlx5_0: Input/output error (md reg_mem_types 0x15)\n[1613401128.440773] [nid001192:11838:0] ucp_request.c:269 UCX ERROR failed to register user buffer datatype 0x8 address 0xabcf12c0 len 26400: Input/output error\nMPICH ERROR [Rank 1534] [job id 114930.0] [Mon Feb 15 14:58:48 2021] [unknown] [nid001192] - Abort(672797967) (rank 1534 in comm 0): Fatal error in PMPI_Isend: Other MPI error, error stack:\nPMPI_Isend(160)......: MPI_Isend(buf=0xabcf12c0, count=3300, MPI_DOUBLE_PRECISION, dest=1612, tag=4, comm=0x84000004, request=0x7fffb38fa0fc) failed\nMPID_Isend(416)......:\nMPID_isend_unsafe(92):\nMPIDI_UCX_send(95)...: returned failed request in UCX netmod(ucx_send.h 95 MPIDI_UCX_send Input/output error)\naborting job:\nFatal error in PMPI_Isend: Other MPI error, error stack:\nPMPI_Isend(160)......: MPI_Isend(buf=0xabcf12c0, count=3300, MPI_DOUBLE_PRECISION, dest=1612, tag=4, comm=0x84000004, request=0x7fffb38fa0fc) failed\nMPID_Isend(416)......:\nMPID_isend_unsafe(92):\nMPIDI_UCX_send(95)...: returned failed request in UCX netmod(ucx_send.h 95 MPIDI_UCX_send Input/output error)\n[1613401128.457254] [nid001192:11838:0] mm_xpmem.c:82 UCX WARN remote segment id 200002e09 apid 200002e3e is not released, refcount 1\n[1613401128.457261] [nid001192:11838:0] mm_xpmem.c:82 UCX WARN remote segment id 200002e08 apid 100002e3e is not released, refcount 1\n

You can add the following line to your job submission script before the srun command to try and workaround this error:

export UCX_IB_REG_METHODS=direct\n

Note

Setting this flag may have an impact on code performance.

"},{"location":"known-issues/#aocc-compiler-fails-to-compile-with-netcdf-added-2021-11-18","title":"AOCC compiler fails to compile with NetCDF (Added: 2021-11-18)","text":"

There is currently a problem with the module file which means cray-netcdf-hdf5parallel will not operate correctly in PrgEnv-aocc. An example of the error seen is:

F90-F-0004-Corrupt or Old Module file /opt/cray/pe/netcdf-hdf5parallel/4.7.4.3/crayclang/9.1/include/netcdf.mod (netcdf.F90: 8)\n

The current workaround for this is to load module epcc-netcdf-hdf5parallel instead if PrgEnv-aocc is required.

"},{"location":"known-issues/#slurm-export-option-does-not-work-in-job-submission-script","title":"Slurm --export option does not work in job submission script","text":"

The option --export=ALL propagates all the environment variables from the login node to the compute node. If you include the option in the job submission script, it is wrongly ignored by Slurm. The current workaround is to include the option when the job submission script is launched. For instance:

sbatch --export=ALL myjob.slurm\n
"},{"location":"known-issues/#recently-resolved-issues","title":"Recently Resolved Issues","text":""},{"location":"other-software/","title":"Software provided by external parties","text":"

This section describes software that has been installed on ARCHER2 by external parties (i.e. not by the ARCHER2 service itself) for general use by ARCHER2 users or provides useful notes on software that is not installed centrally.

Important

While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"other-software/#research-software","title":"Research Software","text":""},{"location":"other-software/casino/","title":"Casino","text":"

This page has moved

"},{"location":"other-software/cesm-further-examples/","title":"Cesm further examples","text":"

This page has moved

"},{"location":"other-software/cesm213/","title":"Cesm213","text":"

This page has moved

"},{"location":"other-software/cesm213_run/","title":"Cesm213 run","text":"

This page has moved

"},{"location":"other-software/cesm213_setup/","title":"Cesm213 setup","text":"

This page has moved

"},{"location":"other-software/crystal/","title":"Crystal","text":"

This page has moved

"},{"location":"publish/","title":"ARCHER2 and publications","text":"

This section provides information on how to acknowledge the use of ARCHER2 in your published work and how to register your work on ARCHER2 into the ARCHER2 publications database via SAFE.

"},{"location":"publish/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

We will shortly be publishing a description of the ARCHER2 service with a DOI that you can cite in your published work that arises from the use of ARCHER2. Until that time, please add the following words to any work you publish that arises from your use of ARCHER2:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"publish/#archer2-publication-database","title":"ARCHER2 publication database","text":"

The ARCHER2 service maintains a publication database of works that have arisen from ARCHER2 and links them to project IDs that have ARCHER2 access. We ask all users of ARCHER2 to register any publications in the database - all you need is your publication's DOI.

Registering your publications in SAFE has a number of advantages:

"},{"location":"publish/#how-to-register-a-publication-in-the-database","title":"How to register a publication in the database","text":"

You will need a DOI for the publication you wish to register. A DOI has the form of an set of ID strings separated by slashes. For example, 10.7488/ds/1505, you should not include the web host address which provides a link to the DOI.

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to associate the publication with from the list and click View.
  3. The next page will list currently registered publications, to add one click Add.
  4. Enter the DOI in the text field provided and click Add
"},{"location":"publish/#how-to-list-your-publications","title":"How to list your publications","text":"

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to list the publications from using the dropdown menu and click View.
  3. The next page will list your currently registered publications.
"},{"location":"publish/#how-to-export-your-publications","title":"How to export your publications","text":"

At the moment we support export lists of DOIs to comma-separated values (CSV) files. This does not export all the metadata, just the DOIs themselves with a maximum of 25 DOIs per line. This format is primarily useful for importing into ResearchFish (where you can paste in the comma-separated lists to import publications). We plan to add further export formats in the future.

Login to SAFE. Then:

  1. Go to the Menu Your details and select Publications
  2. Select the project you wish to list the publications from using the dropdown menu and click View.
  3. The next page will list your currently registered publications.
  4. Click Export to generate a plain text comma-separated values (CSV) file that lists all DOIs.
  5. If required, you can save this file using the Save command your web browser.
"},{"location":"quick-start/overview/","title":"Quickstart","text":"

The ARCHER2 quickstart guides provide the minimum information for new users or users transferring from ARCHER. There are two sections available which are meant to be followed in sequence.

"},{"location":"quick-start/quickstart-developers/","title":"Quickstart for developers","text":"

This guide aims to quickly enable developers to work on ARCHER2. It assumes that you are familiar with the material in the Quickstart for users section.

"},{"location":"quick-start/quickstart-developers/#compiler-wrappers","title":"Compiler wrappers","text":"

When compiling code on ARCHER2, you should make use of the HPE Cray compiler wrappers. These ensure that the correct libraries and headers (for example, MPI or HPE LibSci) will be used during the compilation and linking stages. These wrappers should be accessed by providing the following compiler names.

Language Wrapper name C cc C++ CC Fortran ftn

This means that you should use the wrapper names whether on the command line, in build scripts, or in configure options. It could be helpful to set some or all of the following environment variables before running a build to ensure that the build tool is aware of the wrappers.

export CC=cc\nexport CXX=CC\nexport FC=ftn\nexport F77=ftn\nexport F90=ftn\n

man pages are available for each wrapper. You can also see the full set of compiler and linker options being used by passing the -craype-verbose option to the wrapper.

Tip

The HPE Cray compiler wrappers should be used instead of the MPI compiler wrappers such as mpicc, mpicxx and mpif90 that you may have used on other HPC systems.

"},{"location":"quick-start/quickstart-developers/#programming-environments","title":"Programming environments","text":"

On login to ARCHER2, the PrgEnv-cray compiler environment will be loaded, as will a cce module. The latter makes available the Cray compilers from the Cray Compiling Environment (CCE), while the former provides the correct wrappers and support to use them. The GNU Compiler Collection (GCC) and the AMD compiler environment (AOCC) are also available.

To make use of any particular compiler environment, you load the correct PrgEnv module. After doing so the compiler wrappers (cc, CC and ftn) will correctly call the compilers from the new suite. The default version of the corresponding compiler suite will also be loaded, but you may swap to another available version if you wish.

The following table summarises the suites and associated compiler environments.

Suite name Module Programming environment collection CCE cce PrgEnv-cray GCC gcc PrgEnv-gnu AOCC aocc PrgEnv-aocc

As an example, after logging in you may wish to use GCC as your compiler suite. Running module load PrgEnv-gnu will replace the default CCE (Cray) environment with the GNU environment. It will also unload the cce module and load the default version of the gcc module; at the time of writing, this is GCC 11.2.0. If you need to use a different version of GCC, for example 10.3.0, you would follow up with module load gcc/10.3.0. At this point you may invoke the compiler wrappers and they will correctly use the HPE libraries and tools in conjunction with GCC 10.3.0.

When choosing the compiler environment, a big factor will likely be which compilers you have previously used for your code's development. The Cray Fortran compiler is similar to the compiler you may be familiar with from ARCHER, while the Cray C and C++ compilers provided on ARCHER2 are new versions that are now derived from Clang. The GCC suite provides gcc/g++ and gfortran. The AOCC suite provides AMD Clang/Clang++ and AMD Flang.

Note

The Intel compilers are not available on ARCHER2.

"},{"location":"quick-start/quickstart-developers/#useful-compiler-options","title":"Useful compiler options","text":"

The compiler options you use will depend on both the software you are building and also on the current stage of development. The following flags should be a good starting point for reasonable performance.

Compilers Optimisation flags Cray C/C++ -O2 -funroll-loops -ffast-math Cray Fortran Default options GCC -O2 -ftree-vectorize -funroll-loops -ffast-math

Tip

If you want to use GCC version 10 or greater to compile MPI Fortran code, you must add the -fallow-argument-mismatch option when compiling otherwise you will see compile errors associated with MPI functions.

When you are happy with your code's performance you may wish to enable more aggressive optimisations; in this case you could start using the following flags. Please note, however, that these optimisations may lead to deviations from IEEE/ISO specifications. If your code relies on strict adherence then using these flags may cause incorrect output.

Compilers Optimisation flags Cray C/C++ -Ofast -funroll-loops Cray Fortran -O3 -hfp3 GCC -Ofast -funroll-loops

Vectorisation is enabled by the Cray Fortran compiler at -O1 and above, by Cray C and C++ at -O2 and above or when using -ftree-vectorize, and by the GCC compilers at -O3 and above or when using -ftree-vectorize.

You may wish to promote default real and integer types in Fortran codes from 4 to 8 bytes. In this case, the following flags may be used.

Compiler Fortran real and integer promotion flags Cray Fortran -s real64 -s integer64 gfortran -freal-4-real-8 -finteger-4-integer-8

More documentation on the compilers is available through man. The pages to read are accessed as follow.

Compiler suite C C++ Fortran Cray man craycc man crayCC man crayftn GNU man gcc man g++ man gfortran

Tip

There are no man pages for the AOCC compilers at the moment.

"},{"location":"quick-start/quickstart-developers/#linking-on-archer2","title":"Linking on ARCHER2","text":"

Executables on ARCHER2 link dynamically, and the Cray Programming Environment does not currently support static linking. This is in contrast to ARCHER where the default was to build statically.

If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

The compiler wrapper scripts on ARCHER link runtime libraries in using the RUNPATH by default. This means that the paths to the runtime libraries are encoded into the executable so you do not need to load the compiler environment in your job submission scripts.

"},{"location":"quick-start/quickstart-developers/#using-runpaths-to-link","title":"Using RUNPATHs to link","text":"

The default behaviour of a dynamically linked executable will be to allow the linker to provide the libraries it needs at runtime by searching the paths in the LD_LIBRARY_PATH environment and then by searching the paths in the RUNPATH variable setting of the binary. This is flexible in that it allows an executable to use newly installed library versions without rebuilding, but in some cases you may prefer to bake the paths to specific libraries into the executable RUNPATH, keeping them constant. While the libraries are still dynamically loaded at run time, from the end user's point of view the resulting behaviour will be similar to that of a statically compiled executable in that they will not need to concern themselves with ensuring the linker will be able to find the libraries.

This is achieved by providing additional paths to add to RUNPATH to the compiler as options. To set the compiler wrappers to do this, you can set the following environment variable.

export CRAY_ADD_RPATH=yes\n
"},{"location":"quick-start/quickstart-developers/#using-rpaths-to-link","title":"Using RPATHs to link","text":"

RPATH differs from RUNPATH in that it searches RPATH directories for libraries before searching the paths in LD_LIBRARY_PATH so they cannot be overridden in the same way at runtime.

You can provide RPATHs directly to the compilers using the -Wl,-rpath=<path-to-directory> flag, where the provided path is to the directory containing the libraries which are themselves typically specified with flags of the type -l<library-name>.

"},{"location":"quick-start/quickstart-developers/#debugging-tools","title":"Debugging tools","text":"

The following debugging tools are available on ARCHER2:

To get started debugging on ARCHER2, you might like to use gdb4hpc. You should first of all compile your code using the -g flag to enable debugging symbols. Once compiled, load the gdb4hpc module and start it:

module load gdb4hpc\ngdb4hpc\n

Once inside gdb4hpc, you can start your program's execution with the launch command:

dbg all> launch $my_prog{128} ./prog\n

In this example, a job called my_prog will be launched to run the executable file prog over 128 cores on a compute node. If you run squeue in another terminal you will be able to see it running. Inside gdb4hpc you may then step through the code's execution, continue to breakpoints that you set with break, print the values of variables at these points, and perform a backtrace on the stack if the program crashes. Debugging jobs will end when you exit gdb4hpc, or you can end them yourself by running, in this example, release $my_prog.

For more information on debugging parallel codes, see the documentation in the Debugging section of the ARCHER2 User and Best Practice Guide.

"},{"location":"quick-start/quickstart-developers/#profiling-tools","title":"Profiling tools","text":"

Profiling on ARCHER2 is provided through the Cray Performance Measurement and Analysis Tools (CrayPAT). This has a number of different components:

The above tools are made available for use by firstly loading the perftools-base module followed by either perftools (for CrayPAT, Reveal and Apprentice2) or one of the perftools-lite modules.

The simplest way to get started profiling your code is with CrayPAT-lite. For example, to sample a run of a code you would load the perftools-base and perftools-lite modules, and then compile (you will receive a message that the executable is being instrumented). Performing a batch run as usual with this executable will produce a directory such as my_prog+74653-2s which can be passed to pat_report to view the results. In this example,

pat_report -O calltree+src my_prog+74653-2s\n

will produce a report containing the call tree. You can view available report keywords to be provided to the -O option by running pat_report -O -h. The available perftools-lite modules are:

Tip

For more information on profiling parallel codes, see the documentation in the Profiling section of the ARCHER2 User and Best Practice Guide.

"},{"location":"quick-start/quickstart-developers/#useful-links","title":"Useful Links","text":"

Links to other documentation you may find useful:

"},{"location":"quick-start/quickstart-next-steps/","title":"Next Steps","text":"

Once you have set up your machine account and logged on, run a job or two and possibly updated and compiled your code: what next?

There is still loads of support and advice available to you:

Getting Started on ARCHER2 gives an overview of some of this help.

Advice on how to Get Access with different funding routes, and if your chosen route requires you to complete a Technical Assessment, we have advice on How to prepare a successful TA

And we also have a comprehensive Training Programme for all levels of experience and a wide range of different uses. All our training is free for UK Academics and we have a list of upcoming training and also all the materials and resources from previous training events.

"},{"location":"quick-start/quickstart-users-totp/","title":"Quickstart for users","text":"

This guide aims to quickly enable new users to get up and running on ARCHER2. It covers the process of getting an ARCHER2 account, logging in and running your first job.

"},{"location":"quick-start/quickstart-users-totp/#request-an-account-on-archer2","title":"Request an account on ARCHER2","text":"

Important

You need to use both a password and a passphrase-protected SSH key pair to log into ARCHER2. You get the password from SAFE, but, you will also need to setup your own SSH key pair and add the public part to your account via SAFE before you will be able to log in. We cover the authentication steps below.

"},{"location":"quick-start/quickstart-users-totp/#obtain-an-account-on-the-safe-website","title":"Obtain an account on the SAFE website","text":"

Warning

We have seen issues with Gmail blocking emails from SAFE so we recommend that users use their institutional/work email address rather than Gmail addresses to register for SAFE accounts.

The first step is to sign up for an account on the ARCHER2 SAFE website. The SAFE account is used to manage all of your login accounts, allowing you to report on your usage and quotas. To do this:

  1. Go to the SAFE New User Signup Form
  2. Fill in your personal details. You can come back later and change them if you wish
  3. Click Submit

You are now registered. Your SAFE password will be emailed to the email address you provided. You can then login with that email address and password. (You can change your initial SAFE password whenever you want by selecting the Change SAFE password option from the Your details menu.)

"},{"location":"quick-start/quickstart-users-totp/#request-an-archer2-login-account","title":"Request an ARCHER2 login account","text":"

Once you have a SAFE account and an SSH key you will need to request a user account on ARCHER2 itself. To do this you will require a Project Code; you usually obtain this from the Principle Investigator (PI) or project manager for the project you will be working on. Once you have the Project Code:

Full system
  1. Log into SAFE
  2. Use the Login accounts - Request new account menu item
  3. Select the correct project from the drop down list
  4. Select the archer2 machine in the list of available machines
  5. Click Next
  6. Enter a username for the account and the public part of an SSH key pair
    1. More information on generating SSH key pair can be found in the ARCHER2 User and Best Practice Guide
    2. You can add additional SSH keys using the process described below if you so wish.
  7. Click Request

The PI or project manager of the project will be asked to approve your request. After your request has been approved the account will be created and when this has been done you will receive an email. You can then come back to SAFE and pick up the initial single-use password for your new account.

Note

ARCHER2 account passwords are also sometimes referred to as LDAP passwords by the system.

"},{"location":"quick-start/quickstart-users-totp/#generating-and-adding-an-ssh-key-pair","title":"Generating and adding an SSH key pair","text":"

How you generate your SSH key pair depends on which operating system you use and which SSH client you use to connect to ARCHER2. We will not cover the details on generating an SSH key pair here, but detailed information on this topic is available in the ARCHER2 User and Best Practice Guide.

After generating your SSH key pair, add the public part to your login account using SAFE:

  1. Log into SAFE
  2. Use the menu Login accounts and select the ARCHER2 account to be associated with the SSH key
  3. On the subsequent Login account details page, click the Add Credential button
  4. Select SSH public key as the Credential Type and click Next
  5. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer
  6. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your ARCHER2 account.

Remember, you will need to use both an SSH key and password to log into ARCHER2 so you will also need to collect your initial password before you can log into ARCHER2 for the first time. We cover this next.

Note

If you want to connect to ARCHER2 from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.

"},{"location":"quick-start/quickstart-users-totp/#login-to-archer2","title":"Login to ARCHER2","text":"

To log into ARCHER2 you should use the address:

Full system

ssh [userID]@login.archer2.ac.uk

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you previously logged into the ARCHER2 system before the major upgrade in May/June 2023 with your account you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Tip

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_archer2 you would use the command ssh -i keys/id_rsa_archer2 username@login.archer2.ac.uk to log in.

"},{"location":"quick-start/quickstart-users-totp/#mfa-time-based-one-time-password","title":"MFA Time-based one-time password","text":"

Remember, you will need to use both an SSH key and Time-based one-time password to log into ARCHER2 so you will also need to set up your TOTP before you can log into ARCHER2.

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed You will not use your password when logging on to ARCHER2 after the initial logon.

Hint

More information on connecting to ARCHER2 is available in the Connecting to ARCHER2 section of the User Guide.

"},{"location":"quick-start/quickstart-users-totp/#file-systems-and-manipulating-data","title":"File systems and manipulating data","text":"

ARCHER2 has a number of different file systems and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.

ARCHER2 file systems are:

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

Top tips for managing data on ARCHER2:

Hint

Information on the file systems and best practice in managing you data is available in the Data management and transfer section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users-totp/#accessing-software","title":"Accessing software","text":"

Software on ARCHER2 is principally accessed through modules. These load and unload the desired applications, compilers, tools and libraries through the module command and its subcommands. Some modules will be loaded by default on login, providing a default working environment; many more will be available for use but initially unloaded, allowing you to set up the environment to suit your needs.

At any stage you can check which modules have been loaded by running

module list\n

Running the following command will display all environment modules available on ARCHER2, whether loaded or unloaded

module avail\n

The search field for this command may be narrowed by providing the first few characters of the module name being queried. For example, all available versions and variants of VASP may be found by running

module avail vasp\n

You will see that different versions are available for many modules. For example, vasp/5/5.4.4.pl2 and vasp/6/6.3.2 are two available versions of VASP on the full system. Furthermore, a default version may be specified; this is used if no version is provided by the user.

Important

VASP is licensed software, as are other software packages on ARCHER2. You must have a valid licence to use licensed software on ARCHER2. Often you will need to request access through the SAFE. More on this below.

The module load command loads a module for use. Following the above,

module load vasp/6\n

would load the default version of VASP 6, while

module load vasp/6/6.3.2\n

would specifically load version 6.3.2. A loaded module may be unloaded through the identical module remove command, e.g.

module unload vasp\n

The above unloads whichever version of VASP is currently in the environment. Rather than issuing separate unload and load commands, versions of a module may be swapped as follows:

module swap vasp vasp/5/5.4.4.pl2\n

Other helpful commands are:

Tip

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

Points to be aware of include:

More information on modules and the software environment on ARCHER2 can be found in the Software environment section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users-totp/#requesting-access-to-licensed-software","title":"Requesting access to licensed software","text":"

Some of the software installed on ARCHER2 requires a user to have a valid licence agreed with the software owners/developers to be able to use it (for example, VASP). Although you will be able to load this software on ARCHER2, you will be barred from actually using it until your licence has been verified.

You request access to licensed software through the SAFE (the web administration tool you used to apply for your account and retrieve your initial password) by being added to the appropriate Package Group. To request access to licensed software:

  1. Log in to SAFE
  2. Go to the Menu Login accounts and select the login account which requires access to the software
  3. Click New Package Group Request
  4. Select the software from the list of available packages and click Select Package Group
  5. Fill in as much information as possible about your license; at the very least provide the information requested at the top of the screen such as the licence holder's name and contact details. If you are covered by the license because the licence holder is your supervisor, for example, please state this.
  6. Click Submit

Your request will then be processed by the ARCHER2 Service Desk who will confirm your license with the software owners/developers before enabling your access to the software on ARCHER2. This can take several days (depending on how quickly the software owners/developers take to respond) but you will be advised once this has been done.

"},{"location":"quick-start/quickstart-users-totp/#create-a-job-submission-script","title":"Create a job submission script","text":"

To run a program on the ARCHER2 compute nodes you need to write a job submission script that tells the system how many compute nodes you want to reserve and for how long. You also need to use the srun command to launch your parallel executable.

Hint

For a more details on the Slurm scheduler on ARCHER2 and writing job submission scripts see the Running jobs on ARCHER2 section of the User and Best Practice Guide.

Important

Parallel jobs on ARCHER2 should be run from the work file systems as the home file systems are not available on the compute nodes - you will see a chdir or file not found error if you try to access data on the home file system within a parallel job running on the compute nodes.

Create a job submission script called submit.slurm in your space on the work file systems using your favourite text editor. For example, using vim:

auser@ln01:~> cd /work/t01/t01/auser\nauser@ln01:/work/t01/t01/auser> vim submit.slurm\n

Tip

You will need to use your project code and username to get to the correct directory. i.e. replace the t01 above with your project code and replace the username auser with your ARCHER2 username.

Paste the following text into your job submission script, replacing ENTER_YOUR_BUDGET_CODE_HERE with your budget code e.g. e99-ham, ENTER_PARTITION_HERE with the partition you wish to run on (e.g standard), and ENTER_QOS_HERE with the quality of service you want (e.g. standard).

Full system
#!/bin/bash --login\n\n#SBATCH --job-name=test_job\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:5:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module to get access to the xthi program\nmodule load xthi\n\n# Recommended environment settings\n# Stop unintentional multi-threading within software libraries\nexport OMP_NUM_THREADS=1\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# srun launches the parallel program based on the SBATCH options\nsrun --distribution=block:block --hint=nomultithread xthi_mpi\n
"},{"location":"quick-start/quickstart-users-totp/#submit-your-job-to-the-queue","title":"Submit your job to the queue","text":"

You submit your job to the queues using the sbatch command:

auser@ln01:/work/t01/t01/auser> sbatch submit.slurm\nSubmitted batch job 23996\n\nThe value returned is your *Job ID*.\n
"},{"location":"quick-start/quickstart-users-totp/#monitoring-your-job","title":"Monitoring your job","text":"

You use the squeue command to examine jobs in the queue. To list all the jobs you have in the queue, use:

auser@ln01:/work/t01/t01/auser> squeue -u $USER\n

squeue on its own lists all jobs in the queue from all users.

"},{"location":"quick-start/quickstart-users-totp/#checking-the-output-from-the-job","title":"Checking the output from the job","text":"

The job submission script above should write the output to a file called slurm-<jobID>.out (i.e. if the Job ID was 23996, the file would be slurm-23996.out), you can check the contents of this file with the cat command. If the job was successful you should see output that looks something like:

auser@ln01:/work/t01/t01/auser> cat slurm-23996.out\nNode    0, hostname nid001020\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    0, rank    8, thread   0, (affinity =    8)\nNode    0, rank    9, thread   0, (affinity =    9)\nNode    0, rank   10, thread   0, (affinity =   10)\nNode    0, rank   11, thread   0, (affinity =   11)\nNode    0, rank   12, thread   0, (affinity =   12)\nNode    0, rank   13, thread   0, (affinity =   13)\nNode    0, rank   14, thread   0, (affinity =   14)\nNode    0, rank   15, thread   0, (affinity =   15)\nNode    0, rank   16, thread   0, (affinity =   16)\nNode    0, rank   17, thread   0, (affinity =   17)\nNode    0, rank   18, thread   0, (affinity =   18)\nNode    0, rank   19, thread   0, (affinity =   19)\nNode    0, rank   20, thread   0, (affinity =   20)\nNode    0, rank   21, thread   0, (affinity =   21)\n... output trimmed ...\n

If something has gone wrong, you will find any error messages in the file instead of the expected output.

"},{"location":"quick-start/quickstart-users-totp/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

You should use the following phrase to acknowledge ARCHER2 for all research outputs that were generated using the ARCHER2 service:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"quick-start/quickstart-users-totp/#useful-links","title":"Useful Links","text":"

If you plan to compile your own programs on ARCHER2, you may also want to look at Quickstart for developers.

Other documentation you may find useful:

"},{"location":"quick-start/quickstart-users/","title":"Quickstart for users","text":"

This guide aims to quickly enable new users to get up and running on ARCHER2. It covers the process of getting an ARCHER2 account, logging in and running your first job.

"},{"location":"quick-start/quickstart-users/#request-an-account-on-archer2","title":"Request an account on ARCHER2","text":"

Important

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password (TOTP). Additionally, the first time you ever log into an account on ARCHER2, you will need to use a single use password you retrieve from SAFE.

"},{"location":"quick-start/quickstart-users/#obtain-an-account-on-the-safe-website","title":"Obtain an account on the SAFE website","text":"

Warning

We have seen issues with Gmail blocking emails from SAFE so we recommend that users use their institutional/work email address rather than Gmail addresses to register for SAFE accounts.

The first step is to sign up for an account on the ARCHER2 SAFE website. The SAFE account is used to manage all of your login accounts, allowing you to report on your usage and quotas. To do this:

  1. Go to the SAFE New User Signup Form
  2. Fill in your personal details. You can come back later and change them if you wish
  3. Click Submit

You are now registered. Your SAFE password will be emailed to the email address you provided. You can then login with that email address and password. (You can change your initial SAFE password whenever you want by selecting the Change SAFE password option from the Your details menu.)

"},{"location":"quick-start/quickstart-users/#request-an-archer2-login-account","title":"Request an ARCHER2 login account","text":"

Once you have a SAFE account and an SSH key you will need to request a user account on ARCHER2 itself. To do this you will require a Project Code; you usually obtain this from the Principle Investigator (PI) or project manager for the project you will be working on. Once you have the Project Code:

Full system
  1. Log into SAFE
  2. Use the Login accounts - Request new account menu item
  3. Select the correct project from the drop down list
  4. Select the archer2 machine in the list of available machines
  5. Click Next
  6. Enter a username for the account and the public part of an SSH key pair
    1. More information on generating SSH key pair can be found in the ARCHER2 User and Best Practice Guide
    2. You can add additional SSH keys using the process described below if you so wish.
  7. Click Request

The PI or project manager of the project will be asked to approve your request. After your request has been approved the account will be created and when this has been done you will receive an email. You can then come back to SAFE and pick up the initial single-use password for your new account.

Note

ARCHER2 account passwords are also sometimes referred to as LDAP passwords by the system.

"},{"location":"quick-start/quickstart-users/#generating-and-adding-an-ssh-key-pair","title":"Generating and adding an SSH key pair","text":"

How you generate your SSH key pair depends on which operating system you use and which SSH client you use to connect to ARCHER2. We will not cover the details on generating an SSH key pair here, but detailed information on this topic is available in the ARCHER2 User and Best Practice Guide.

After generating your SSH key pair, add the public part to your login account using SAFE:

  1. Log into SAFE
  2. Use the menu Login accounts and select the ARCHER2 account to be associated with the SSH key
  3. On the subsequent Login account details page, click the Add Credential button
  4. Select SSH public key as the Credential Type and click Next
  5. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer
  6. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your ARCHER2 account.

Remember, you will need to use both an SSH key and password to log into ARCHER2 so you will also need to collect your initial password before you can log into ARCHER2 for the first time. We cover this next.

Note

If you want to connect to ARCHER2 from more than one machine, e.g. from your home laptop as well as your work laptop, you should generate an ssh key on each machine, and add each of the public keys into SAFE.

"},{"location":"quick-start/quickstart-users/#login-to-archer2","title":"Login to ARCHER2","text":"

To log into ARCHER2 you should use the address:

ssh [userID]@login.archer2.ac.uk

The order in which you are asked for credentials depends on the system you are accessing:

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you previously logged into the ARCHER2 system before the major upgrade in May/June 2023 with your account you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Tip

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_archer2 you would use the command ssh -i keys/id_rsa_archer2 username@login.archer2.ac.uk to log in.

"},{"location":"quick-start/quickstart-users/#mfa-time-based-one-time-password","title":"MFA Time-based one-time password","text":"

Remember, you will need to use both an SSH key and Time-based one-time password to log into ARCHER2 so you will also need to set up your TOTP before you can log into ARCHER2.

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed You will not use your password when logging on to ARCHER2 after the initial logon.

Hint

More information on connecting to ARCHER2 is available in the Connecting to ARCHER2 section of the User Guide.

"},{"location":"quick-start/quickstart-users/#file-systems-and-manipulating-data","title":"File systems and manipulating data","text":"

ARCHER2 has a number of different file systems and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.

ARCHER2 file systems are:

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

Top tips for managing data on ARCHER2:

Hint

Information on the file systems and best practice in managing you data is available in the Data management and transfer section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users/#accessing-software","title":"Accessing software","text":"

Software on ARCHER2 is principally accessed through modules. These load and unload the desired applications, compilers, tools and libraries through the module command and its subcommands. Some modules will be loaded by default on login, providing a default working environment; many more will be available for use but initially unloaded, allowing you to set up the environment to suit your needs.

At any stage you can check which modules have been loaded by running

module list\n

Running the following command will display all environment modules available on ARCHER2, whether loaded or unloaded

module avail\n

The search field for this command may be narrowed by providing the first few characters of the module name being queried. For example, all available versions and variants of VASP may be found by running

module avail vasp\n

You will see that different versions are available for many modules. For example, vasp/5/5.4.4.pl2 and vasp/6/6.3.2 are two available versions of VASP on the full system. Furthermore, a default version may be specified; this is used if no version is provided by the user.

Important

VASP is licensed software, as are other software packages on ARCHER2. You must have a valid licence to use licensed software on ARCHER2. Often you will need to request access through the SAFE. More on this below.

The module load command loads a module for use. Following the above,

module load vasp/6\n

would load the default version of VASP 6, while

module load vasp/6/6.3.2\n

would specifically load version 6.3.2. A loaded module may be unloaded through the identical module remove command, e.g.

module unload vasp\n

The above unloads whichever version of VASP is currently in the environment. Rather than issuing separate unload and load commands, versions of a module may be swapped as follows:

module swap vasp vasp/5/5.4.4.pl2\n

Other helpful commands are:

Tip

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

Points to be aware of include:

More information on modules and the software environment on ARCHER2 can be found in the Software environment section of the User and Best Practice Guide.

"},{"location":"quick-start/quickstart-users/#requesting-access-to-licensed-software","title":"Requesting access to licensed software","text":"

Some of the software installed on ARCHER2 requires a user to have a valid licence agreed with the software owners/developers to be able to use it (for example, VASP). Although you will be able to load this software on ARCHER2, you will be barred from actually using it until your licence has been verified.

You request access to licensed software through the SAFE (the web administration tool you used to apply for your account and retrieve your initial password) by being added to the appropriate Package Group. To request access to licensed software:

  1. Log in to SAFE
  2. Go to the Menu Login accounts and select the login account which requires access to the software
  3. Click New Package Group Request
  4. Select the software from the list of available packages and click Select Package Group
  5. Fill in as much information as possible about your license; at the very least provide the information requested at the top of the screen such as the licence holder's name and contact details. If you are covered by the license because the licence holder is your supervisor, for example, please state this.
  6. Click Submit

Your request will then be processed by the ARCHER2 Service Desk who will confirm your license with the software owners/developers before enabling your access to the software on ARCHER2. This can take several days (depending on how quickly the software owners/developers take to respond) but you will be advised once this has been done.

"},{"location":"quick-start/quickstart-users/#create-a-job-submission-script","title":"Create a job submission script","text":"

To run a program on the ARCHER2 compute nodes you need to write a job submission script that tells the system how many compute nodes you want to reserve and for how long. You also need to use the srun command to launch your parallel executable.

Hint

For a more details on the Slurm scheduler on ARCHER2 and writing job submission scripts see the Running jobs on ARCHER2 section of the User and Best Practice Guide.

Important

Parallel jobs on ARCHER2 should be run from the work file systems as the home file systems are not available on the compute nodes - you will see a chdir or file not found error if you try to access data on the home file system within a parallel job running on the compute nodes.

Create a job submission script called submit.slurm in your space on the work file systems using your favourite text editor. For example, using vim:

auser@ln01:~> cd /work/t01/t01/auser\nauser@ln01:/work/t01/t01/auser> vim submit.slurm\n

Tip

You will need to use your project code and username to get to the correct directory. i.e. replace the t01 above with your project code and replace the username auser with your ARCHER2 username.

Paste the following text into your job submission script, replacing ENTER_YOUR_BUDGET_CODE_HERE with your budget code e.g. e99-ham, ENTER_PARTITION_HERE with the partition you wish to run on (e.g standard), and ENTER_QOS_HERE with the quality of service you want (e.g. standard).

Full system
#!/bin/bash --login\n\n#SBATCH --job-name=test_job\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:5:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module to get access to the xthi program\nmodule load xthi\n\n# Recommended environment settings\n# Stop unintentional multi-threading within software libraries\nexport OMP_NUM_THREADS=1\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# srun launches the parallel program based on the SBATCH options\nsrun --distribution=block:block --hint=nomultithread xthi_mpi\n
"},{"location":"quick-start/quickstart-users/#submit-your-job-to-the-queue","title":"Submit your job to the queue","text":"

You submit your job to the queues using the sbatch command:

auser@ln01:/work/t01/t01/auser> sbatch submit.slurm\nSubmitted batch job 23996\n\nThe value returned is your *Job ID*.\n
"},{"location":"quick-start/quickstart-users/#monitoring-your-job","title":"Monitoring your job","text":"

You use the squeue command to examine jobs in the queue. To list all the jobs you have in the queue, use:

auser@ln01:/work/t01/t01/auser> squeue -u $USER\n

squeue on its own lists all jobs in the queue from all users.

"},{"location":"quick-start/quickstart-users/#checking-the-output-from-the-job","title":"Checking the output from the job","text":"

The job submission script above should write the output to a file called slurm-<jobID>.out (i.e. if the Job ID was 23996, the file would be slurm-23996.out), you can check the contents of this file with the cat command. If the job was successful you should see output that looks something like:

auser@ln01:/work/t01/t01/auser> cat slurm-23996.out\nNode    0, hostname nid001020\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    0, rank    8, thread   0, (affinity =    8)\nNode    0, rank    9, thread   0, (affinity =    9)\nNode    0, rank   10, thread   0, (affinity =   10)\nNode    0, rank   11, thread   0, (affinity =   11)\nNode    0, rank   12, thread   0, (affinity =   12)\nNode    0, rank   13, thread   0, (affinity =   13)\nNode    0, rank   14, thread   0, (affinity =   14)\nNode    0, rank   15, thread   0, (affinity =   15)\nNode    0, rank   16, thread   0, (affinity =   16)\nNode    0, rank   17, thread   0, (affinity =   17)\nNode    0, rank   18, thread   0, (affinity =   18)\nNode    0, rank   19, thread   0, (affinity =   19)\nNode    0, rank   20, thread   0, (affinity =   20)\nNode    0, rank   21, thread   0, (affinity =   21)\n... output trimmed ...\n

If something has gone wrong, you will find any error messages in the file instead of the expected output.

"},{"location":"quick-start/quickstart-users/#acknowledging-archer2","title":"Acknowledging ARCHER2","text":"

You should use the following phrase to acknowledge ARCHER2 for all research outputs that were generated using the ARCHER2 service:

This work used the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk).

You should also tag outputs with the keyword \"ARCHER2\" whenever possible.

"},{"location":"quick-start/quickstart-users/#useful-links","title":"Useful Links","text":"

If you plan to compile your own programs on ARCHER2, you may also want to look at Quickstart for developers.

Other documentation you may find useful:

"},{"location":"research-software/","title":"Research Software","text":"

ARCHER2 provides a number of research software packages as centrally supported packages. Many of these packages are free to use, but others require a license (which you, or your research group, need to supply).

This section also contains information on research software contributed and/or supported by third parties (marked with a * in the list below).

For centrally supported packages, the version available will usually be the current stable release, to include major releases and significant updates. We will usually not maintain older versions and versions no longer supported by the developers of the package.

The following sections provide details on access to each of the centrally installed packages (software that is not part of the fully-supported software stack are marked with *):

"},{"location":"research-software/#not-on-the-list","title":"Not on the list?","text":"

If the software you are interested in is not in the above list, we may still be able to help you install your own version, either individually, or as a project. Please contact the Service Desk.

"},{"location":"research-software/casino/","title":"CASINO","text":"

Note

CASINO is not available as central install/module on ARCHER2 at this time. This page provides tips on using CASINO on ARCHER2 for users who have obtained their own copy of the code.

Important

CASINO is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

CASINO is a computer program system for performing quantum Monte Carlo (QMC) electronic structure calculations that has been developed by a group of researchers initially working in the Theory of Condensed Matter group in the Cambridge University physics department, and their collaborators, over more than 20 years. It is capable of calculating incredibly accurate solutions to the Schr\u00f6dinger equation of quantum mechanics for realistic systems built from atoms.

"},{"location":"research-software/casino/#useful-links","title":"Useful Links","text":""},{"location":"research-software/casino/#compiling-casino-on-archer2","title":"Compiling CASINO on ARCHER2","text":"

You should use the linuxpc-gcc-slurm-parallel.archer2 configuration that is supplied along with the CASINO source code to build on ARCHER2 and ensure that you build the \"Shm\" (System-V shared memory) version of the code.

Bug

The linuxpc-cray-slurm-parallel.archer2 configuration produces a binary that crashes with a segfault and should not be used.

"},{"location":"research-software/casino/#using-casino-on-archer2","title":"Using CASINO on ARCHER2","text":"

The performance of CASINO on ARCHER2 is critically dependent on three things:

Next, we show how to make sure that the MPI transport layer is set to UCX, how to set the number of cores sharing the System-V shared memory segments and how to pin MPI processes sequentially to cores.

Finally, we provide a job submission script that demonstrates all these options together.

"},{"location":"research-software/casino/#setting-the-mpi-transport-layer-to-ucx","title":"Setting the MPI transport layer to UCX","text":"

In your job submission script that runs CASINO you switch to using UCX as the MPI transport layer by including the following lines before you run CASINO (i.e. before the srun command that launches the CASINO executable):

module load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n
"},{"location":"research-software/casino/#setting-the-number-of-cores-sharing-memory","title":"Setting the number of cores sharing memory","text":"

In your job submission script you set the number of cores sharing memory segments by setting the CASINO_NUMABLK environment variable before you run CASINO. For example, to specify that there should be shared memory segments each shared between 16 cores, you would use:

export CASINO_NUMABLK=16\n

Tip

If you do not set CASINO_NUMABLK then CASINO will use the default of all cores on a node (the equivalent of setting it to 128) which will give very poor performance so you should always set this environment variable. Setting CASINO_NUMABLK to 8 or 16 cores gives the best performance. 32 cores is acceptable if you want to maximise memory efficiency. Using 64 and 128 gives poor performance.

"},{"location":"research-software/casino/#pinning-mpi-processes-sequentially-to-cores","title":"Pinning MPI processes sequentially to cores","text":"

For shared memory segments to work efficiently MPI processes must be pinned sequentially to cores on compute nodes (so that cores sharing memory are close in the node memory hierarchy). To do this, you add the following options to the srun command in your job script that runs the CASINO executable:

--distribution=block:block --hint=nomultithread\n
"},{"location":"research-software/casino/#example-casino-job-submission-script","title":"Example CASINO job submission script","text":"

The following script will run a CASINO job using 16 nodes (2048 cores).

#!/bin/bash\n\n# Request 16 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=CASINO\n#SBATCH --nodes=16\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure we are using UCX as the MPI transport layer\nmodule load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n\n# Set CASINO to share memory across 16 core blocks\nexport CASINO_NUMABLK=16\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the location of the CASINO executable - this must be on /work\n#   Replace this with the path to your compiled CASINO binary\nCASINO_EXE=/work/t01/t01/auser/CASINO/bin_qmc/linuxpc-gcc-slurm-parallel.archer2/Shm/opt/casino\n\n# Launch CASINO with MPI processes pinned to cores in a sequential order\nsrun --distribution=block:block --hint=nomultithread ${CASINO_EXE}\n
"},{"location":"research-software/casino/#casino-performance-on-archer2","title":"CASINO performance on ARCHER2","text":"

We have run the benzene_dimer benchmark on ARCHER2 with the following configuration:

Timings are reported as time taken for 100 equilibration steps in DMC calculation.

"},{"location":"research-software/casino/#casino_numablk8","title":"CASINO_NUMABLK=8","text":"Nodes Time taken (s) Speedup 1 289.90 1.0 2 154.93 1.9 4 81.06 3.6 8 41.44 7.0 16 23.16 12.5"},{"location":"research-software/castep/","title":"CASTEP","text":"

CASTEP is a leading code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of properties of materials proprieties including energetics, structure at the atomic level, vibrational properties, electronic response properties etc. In particular it has a wide range of spectroscopic features that link directly to experiment, such as infra-red and Raman spectroscopies, NMR, and core level spectra.

"},{"location":"research-software/castep/#useful-links","title":"Useful Links","text":""},{"location":"research-software/castep/#using-castep-on-archer2","title":"Using CASTEP on ARCHER2","text":"

CASTEP is only available to users who have a valid CASTEP licence.

If you have a CASTEP licence and wish to have access to CASTEP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/castep/#note-on-using-relativistic-j-dependent-pseudopotentials","title":"Note on using Relativistic J-dependent pseudopotentials","text":"

These pseudopotentials cannot be generated on the fly by CASTEP and so are available in the following directory on ARCHER2:

/work/y07/shared/apps/core/castep/pseudopotentials\n
"},{"location":"research-software/castep/#running-parallel-castep-jobs","title":"Running parallel CASTEP jobs","text":"

The following script will run a CASTEP job using 2 nodes (256 cores). it assumes that the input files have the file stem text_calc.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=CASTEP\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Load the CASTEP module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load castep\nexport OMP_NUM_THREADS=1\nsrun --distribution=block:block --hint=nomultithread castep.mpi test_calc\n
"},{"location":"research-software/castep/#using-serial-castep-tools","title":"Using serial CASTEP tools","text":"

Serial CASTEP tools are available in the standard CASTEP module.

"},{"location":"research-software/castep/#compiling-castep","title":"Compiling CASTEP","text":"

The latest instructions for building CASTEP on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/cesm-further-examples/","title":"Further Examples CESM 2.1.3","text":"

In the process of porting CESM 2.1.3 to ARCHER2, a set of 4 long runs were carried out. This page contains the four example cases which have been validated with longer runs. They vary in the numbers of cores or threads used, but included here are the PE layouts used in these validation runs, which can be used as a guide for other runs. While only these four compsets and grids have been validated, CESM2 is not bound to just these cases. Links to the UCAR/NCAR pages on configurations, compsets and grids are in the useful links section of the CESM2.1.3 on ARCHER2 page, which can be used to find many of the defined compsets for CESM2.1.3.

"},{"location":"research-software/cesm-further-examples/#atmosphere-only-f2000climo","title":"Atmosphere-only / F2000climo","text":"

This compset uses the F09 grid which is roughly equivalent to a 1 degree resolution. On ARCHER2 with four nodes this configuration should give a throughput of around 7.8 simulated years per (wallclock) day (SYPD). The commands to set up and run the case are as follows:

${CIMEROOT}/scripts/create_newcase --case [case name] --compset F2000climo --res f09_f09_mg17 --walltime [enough time] --project [project code]\ncd [case directory]\n./xmlchange NTASKS=512,NTASKS_ESP=1\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#slab-ocean-etest","title":"Slab Ocean / ETEST","text":"

The slab ocean case is similar to the atmosphere-only case in terms of resources needed, as the slab ocean is inexpensive to simulate in comparison to the atmosphere. The setup detailed below uses two OMP threads, and more tasks than were used by the F2000climo case, and so a throughput of around 20 SYPD can be expected. Unlike F2000climo, but like most compsets, this is unsupported (meaning it has not been scientifically verified by NCAR personnel) and as such an extra argument is required when creating the case. The arguments for ROOTPE are to guard against poor decisions being automatically chosen with respect to resources.

${CIMEROOT}/scripts/create_newcase --case [case name] --compset ETEST --res f09_g17 --walltime [enough time] --project [project code] --run-unsupported\ncd [case directory]\n./xmlchange NTASKS=1024,NTASKS_ESP=1\n./xmlchange NTHRDS=2\n./xmlchange ROOTPE_ICE=0,ROOTPE_OCN=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#coupled-ocean-b1850","title":"Coupled Ocean / B1850","text":"

Compsets with the B prefix are fully coupled, and actively simulate all components. As such, This case is more expensive to run, most especially the ocean component. This case can be set up to run on dedicated nodes by changing the $ROOTPE variables (run the ./pelayout command to check that you have things as you wish). This should give a throughput of just over 10 SYPD.

${CIMEROOT}/scripts/create_newcase --case [case name] --compset B1850 --res f09_g17 --walltime [enough time] --project [project name]\ncd [case directory]\n./xmlchange NTASKS_CPL=1024,NTASKS_ICE=256,NTASKS_LND=256,NTASKS_GLC=128,NTASKS_ROF=128,NTASKS_WAV=256,NTASKS_OCN=512,NTASKS_ATM=1024\n./xmlchange ROOTPE_CPL=0,ROOTPE_ICE=0,ROOTPE_LND=256,ROOTPE_GLC=512,ROOTPE_ROF=640,ROOTPE_WAV=768,ROOTPE_OCN=1024,ROOTPE_ATM=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n

You can also define the PE layout in terms of full nodes by using negative values. As such, for a $MAX_MPITASKS_PER_NODE=128 and $MAX_TASKS_PER_NODE=128, the below is equivalent to the above:

${CIMEROOT}/scripts/create_newcase --case [case name] --compset B1850 --res f09_g17 --walltime [enough time] --project [project name]\ncd [case directory]\n./xmlchange NTASKS_CPL=-8,NTASKS_ICE=-2,NTASKS_LND=-2,NTASKS_GLC=-1,NTASKS_ROF=-1,NTASKS_WAV=-2,NTASKS_OCN=-4,NTASKS_ATM=-8\n./xmlchange ROOTPE_CPL=0,ROOTPE_ICE=0,ROOTPE_LND=-2,ROOTPE_GLC=-4,ROOTPE_ROF=-5,ROOTPE_WAV=-6,ROOTPE_OCN=-8,ROOTPE_ATM=0\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm-further-examples/#waccm-x-fxhist","title":"WACCM-X / FXHIST","text":"

The WACCM-X case needs care during the set up and running for a couple of reasons. Firstly, as mentioned in the known issues section on archiving errors the short-term archiver can sometimes move too many files and thus create problems with resubmissions. Secondly, it can pick up other files in the cesm_inputdata directory, causing issues when running. WACCM-X is also comparatively very expensive, and so only has an expected throughput of a little over 1.5 SYPD, and that when on a coarser grid than above. The setup for running a WACCM-X case with approximately 2 degree resolution and no short-term archiving is

${CIMEROOT}/scripts/create_newcase --case [case name] --compset FXHIST --res f19_f19_mg16 --walltime [enough time] --project [project name] --run-unsupported\ncd [case directory]\n./xmlchange NTASKS=512,NTASKS_ESP=1\n./xmlchange NTHRDS=2\n./xmlchange DOUT_S=FALSE\n[Any other changes e.g. run length or resubmissions]\n./case.setup\n./case.build\n./case.submit\n
"},{"location":"research-software/cesm/","title":"Community Earth System Model (CESM2)","text":"

CESM2 is a fully-coupled, community, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states. It has seven different components: atmosphere, ocean, river run off, sea ice, land ice, waves and adaptive river transport.

Important

CESM is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/cesm/#cesm-213","title":"CESM 2.1.3","text":"

At the time of writing, CESM 2.1.3 is the latest scientifically verified version of the model.

"},{"location":"research-software/cesm/#setting-up-cesm-213-on-archer2","title":"Setting up CESM 2.1.3 on ARCHER2","text":"

Due to the nature of CESM2, there is not a centrally installed version of the program available on ARCHER2. Instead, users download their own copy of the program and make use of ARCHER2-specific configurations that have been rigorously tested.

The setup process has been streamlined on ARCHER2 and can be carried out by following the instructions on the ARCHER2 CESM2.1.3 setup page

"},{"location":"research-software/cesm/#using-cesm-213-on-archer2","title":"Using CESM 2.1.3 on ARCHER2","text":"

A quickstart guide for running a simple coupled case of CESM 2.1.3 on ARCHER2 can be found here. It should be noted that this is only a quickstart guide with a focus on the way that CESM 2.1.3 should be run specifically on ARCHER2, and is not intended to replace the larger CESM or CIME documentation linked to below.

"},{"location":"research-software/cesm/#useful-links","title":"Useful Links","text":""},{"location":"research-software/cesm/#documentation","title":"Documentation","text":"

If this is your first time running CESM2, it is highly recommended that you consult both the CIME documentation and the NCAR CESM pages for the version used in CESM 2.1.3, paying particular attention to the pages on Basic Usage of CIME which gives detailed description of the basic commands needed to get a model running.

"},{"location":"research-software/cesm/#compsets-and-configurations","title":"Compsets and Configurations","text":"

CESM2 allows simulations to be carried out using a very wide range of configurations. If you are new to CESM2 it is highly recommended that, unless you are running a case you are already familiar with, you consult the CESM2.1 Configurations page. You can also see a list of the defined compsets already available on the component set definitions page. More information about configurations, grids and compsets can be found on the CESM2 Configurations and Grids page, which includes links to the configuration settings of the different components.

"},{"location":"research-software/cesm213_run/","title":"Quick Start: CESM Model Workflow (CESM 2.1.3)","text":"

This is the procedure for quickly setting up and running a simple CESM2 case on ARCHER2. This document is based on the general quickstart guide for CESM 2.1, with modifications to give instructions specific to ARCHER2. For more expansive instructions on running CESM 2.1, please consult the NCAR CESM pages

Before following these instructions, ensure you have completed the setup procedure (see Setting up CESM2 on ARCHER2).

For your target case, the first step is to select a component set, and a resolution for your case. For the purposes of this guide, we will be looking at a simple coupled case using the B1850 compset and the f19_g17 resolution.

The current configuration of CESM 2.1.3 on ARCHER2 has been validated with the F2000 (atmosphere only), ETEST (slab ocean), B1850 (fully coupled) and FX2000 (WACCM-X) compsets. Instructions for these are here: CESM2.1.3 further examples.

Details of available component sets and resolutions are available from the query_config tool located in the my_cesm_sandbox/cime/scripts directory

cd my_cesm_sandbox/cime/scripts\n./query_config --help\n

See the supported component sets, supported model resolutions and supported machines for a complete list of CESM2 supported component sets, grids and computational platforms.

Note: Variables presented as $VAR in this guide typically refer to variables in XML files in a CESM case. From within a case directory, you can determine the value of such a variable with ./xmlquery VAR. In some instances, $VAR refers to a shell variable or some other variable; we try to make these exceptions clear.

"},{"location":"research-software/cesm213_run/#preparing-a-case","title":"Preparing a case","text":"

There are three stages to preparing the case: create, setup and build. Here you can find information on each of these steps

"},{"location":"research-software/cesm213_run/#1-create-a-case","title":"1. Create a case","text":"

The create_newcase command creates a case directory containing the scripts and XML files to configure a case (see below) for the requested resolution, component set, and machine. create_newcase has three required arguments: --case, --compset and --res (invoke create_newcase --help for help).

On machines where a project or account code is needed (including ARCHER2), you must either specify the --project argument to create_newcase or set the $PROJECT variable in your shell environment.

If running on a supported machine, that machine will normally be recognized automatically and therefore it is not required to specify the --machine argument to create_newcase. For CESM 2.1.3, ARCHER2 is classed as an unsupported machine, however the configurations for ARCHER2 are included in the version of cime downloaded in the setup process, and so adding the --machine flag should not be necessary.

Invoke create_newcase as follows:

./create_newcase --case CASENAME --compset COMPSET --res GRID --project PROJECT\n

where:

Here is an example on ARCHER2 with the CESM2 module loaded:

$CIMEROOT/scripts/create_newcase --case $CESM_ROOT/runs/b.e20.B1850.f19_g17.test --compset B1850 --res f19_g17 --project n02\n
"},{"location":"research-software/cesm213_run/#2-setting-up-the-case-run-script","title":"2. Setting up the case run script","text":"

Issuing the case.setup command creates scripts needed to run the model along with namelist user_nl_xxx files, where xxx denotes the set of components for the given case configuration. Before invoking case.setup, modify the env_mach_pes.xml file in the case directory using the xmlchange command as needed for the experiment.

cd to the case directory. Following the example from above:

cd $CESM_ROOT/runs/b.e20.B1850.f19_g17.test\n

Invoke the case.setup command.

./case.setup\n

If any changes are made to the case, case.setup can be re-run using

./case.setup --reset\n
"},{"location":"research-software/cesm213_run/#3-build-the-executable-using-the-casebuild-command","title":"3. Build the executable using the case.build command","text":"

Run the build script.

./case.build\n

This build may take a while to run, and have periods where the build process doesn't seem to be doing anything. You should only cancel the build if there has been no activity by the build script after 15 minutes.

The CESM executable will appear in the directory given by the XML variable $EXEROOT, which can be queried using:

./xmlquery EXEROOT\n

by default, this will be the bld directory in your case directory.

If any changes are made to xml parameters that would necessitate rebuilding (see the Making Changes section below), then you can apply these by running

./case.setup --reset\n./case.build --clean-all\n./case.build\n
"},{"location":"research-software/cesm213_run/#input-data","title":"Input Data","text":"

Each case of CESM will require input data, which is downloaded from UCAR servers. Input data from similar compsets is often reused, so running two similar cases may not require downloading any additional input data for the second case.

You can check to see if the required input data is already in your input data directory using

./check_input_data\n

If it is not present you can download the input data for the case prior to running the case using

./check_input_data --download\n

This can be useful for cases where a large amount of data is needed, as you can write a simple slurm script to run this download on the serial queue. Information on creating job submission scripts can be found on the ARCHER2 page on Running Jobs.

Downloading the case input data at this stage is optional, and if skipped the data will be downloaded using the login node when you run the case.submit script. This may cause the case.submit script to take a long time to download.

An important thing to note is that your input data will be stored in your /work area, and will contribute to your storage allocation. These input files can sometimes take up a large amount of space, and so it is recommended that you do not keep any input data that is no longer needed.

"},{"location":"research-software/cesm213_run/#making-changes-to-a-case","title":"Making changes to a case","text":"

After creating a new case, the CIME functions can be used to make changes to the case setup, such as changing the wallclock time, number of cores etc.

You can query settings using the xmlquery script from your case directory:

./xmlquery <name_of_setting>\n

Adding the -p flag allows you to look up partial names, for example

$ ./xmlquery -p JOB\n\nOutput:\nResults in group case.run\n        JOB_QUEUE: standard\n        JOB_WALLCLOCK_TIME: 01:30:00\n\nResults in group case.st_archive\n        JOB_QUEUE: short\n        JOB_WALLCLOCK_TIME: 0:20:00\n

Here all parameters that match the JOB pattern are returned. It is worth noting that the parameters JOB_QUEUE and JOB_WALLCLOCK_TIME are present for both the case.run job and the case.st_archive job. To view just one of these, you can use the --subgroup flag:

$ ./xmlquery -p JOB --subgroup case.run\n\nOutput:\nResults in group case.run\n        JOB_QUEUE: standard\n        JOB_WALLCLOCK_TIME: 01:30:00\n

When you know which setting you want to change, you can do so using the xmlchange command

./xmlchange <name_of_setting>=<new_value>\n

For example to change the wallclock time for the case.run job to 30 minutes, without knowing the exact name, you could do

$ ./xmlquery -p WALLCLOCK\n\nOutput:\nResults in group case.run\n        JOB_WALLCLOCK_TIME: 24:00:00\n\nResults in group case.st_archive\n        JOB_WALLCLOCK_TIME: 0:20:00\n\n$ ./xmlchange JOB_WALLCLOCK_TIME=00:30:00 --subgroup case.run\n\n$ ./xmlquery JOB_WALLCLOCK_TIME\n\nOutput:\nResults in group case.run\n        JOB_WALLCLOCK_TIME: 00:30:00\n\nResults in group case.st_archive\n        JOB_WALLCLOCK_TIME: 0:20:00\n

Note: If you try to set a parameter equal to a value that is not known to the program, it might suggest using a --force flag. This may be useful, for example, in the case of using a queue that has not been configured yet, but use with care!

Some changes to the case must be done before calling ./case.setup or ./case.build, otherwise the case will need to be reset or cleaned, using ./case.setup --reset and ./case.build --clean-all. These are as follows:

Many of the namelist variables can be changed just before calling ./case.submit.

"},{"location":"research-software/cesm213_run/#run-the-case","title":"Run the case","text":"

Modify runtime settings in env_run.xml (optional). At this point you may want to change the running parameters of your case, such as run length. By default, the model is set to run for 5 days based on the $STOP_N and $STOP_OPTION variables:

./xmlquery STOP_OPTION,STOP_N\n

These default settings can be useful in troubleshooting runtime problems before submitting for a longer time, but will not allow the model to run long enough to produce monthly history climatology files. In order to produce history files, increase the run length to a month or longer:

./xmlchange STOP_OPTION=nmonths,STOP_N=1\n

If you want a longer run, for example 30 years, this cannot be done in a single job as the amount of wallclock time required would be considerably longer than the maximum allowed by the ARCHER2 queue system. To do this, you would split the simulation into appropriate chunks, such as 6 chunks of 5 years (assuming a simulated years per day (SYPD) of greater than 5 - some values for SYPD on ARCHER2 are given in the further examples page). Using the $RESUBMIT xml variable and setting the values of the $STOP_OPTION and $STOP_N variables accordingly you can then chain the running of these chunks:

./xmlchange RESUBMIT=6, STOP_OPTION= nyears, and STOP_N= 5\n

This would then run 6 resubmissions, each new job picking back up where the previous job had stopped. For more information about this, see the user guide page on running a case.

Once you have set your job to run for the correct length of time, it is a good idea to check the correct amount of resource is available for the job. You can quickly check the job submission parameters by running

./preview_run\n

which will show you at a glance the wallclock times, job queues and the list of jobs to be submitted, as well as other parameters such as the number of MPI tasks, number of OpenMP threads.

Submit the job to the batch queue using the case.submit command.

./case.submit\n

The case.submit script will submit a job called .case.run, and if $DOUT_S is set to TRUE it will also submit a short-term archiving job. By default, the queue these jobs are submitted to is the standard queue. For information on the resources available on each queue, see the QOS guide.

Note: There is a small possibility that your job may initially fail with the error message ERROR: Undefined env var 'CESM_ROOT'. This could have two causes: 1. You do not have the CESM2/2.1.3 module loaded. This module needs to be loaded when running the case as well as when building the case. Try running again after having run module load CESM2/2.1.3 2. This could also be due to a known issue with ARCHER2 where adding the SBATCH directive export=ALL to a slurm script will not work (see the ARCHER2 known issues entry on the subject). The ARCHER2 configuration included in the version of cime that was downloaded during setup should apply a work-around to this, and so you should not see this error in this case. It may still occur in some corner cases however. To avoid this, ensure that the environment from which you are submitting your case has the CESM2/2.1.3 module loaded and run the case.submit script with the following command

./case.submit -a=--export=ALL\n

When the job is complete, most output will not necessarily be written under the case directory, but instead under some other directories. Review the following directories and files, whose locations can be found with xmlquery (note: xmlquery can be run with a list of comma separated names and no spaces):

./xmlquery RUNDIR,CASE,CASEROOT,DOUT_S,DOUT_S_ROOT\n
"},{"location":"research-software/cesm213_run/#monitoring-jobs","title":"Monitoring Jobs","text":"

As CESM jobs are submitted to the ARCHER2 batch system, they can be monitored in the same way as other jobs, using the command

squeue -u $USER\n

You can get more details about the batch scheduler by consulting the ARCHER2 scheduling guide.

"},{"location":"research-software/cesm213_run/#archiving","title":"Archiving","text":"

The CIME framework allows for short-term and long-term archiving of model output. This is particularly useful when the model is configured to output to a small storage space and large files may need to be moved during larger simulations. On ARCHER2, the model is configured to use short-term archiving, but not yet configured for long-term archiving.

Short-term archiving is on by default for compsets and can be toggled on and off using the DOUT_S parameter set to True or False using the xmlchange script:

./xmlchange DOUT_S=FALSE\n

When DOUT_S=TRUE, calling ./case.submit will automatically submit a \u201cst_archive\u201d job to the batch system that will be held in the queue until the main job is complete. This can be configured in the same way as the main job for a different queue, wallclock time, etc. One change that may be advisable to make would be to change the queue your st_archive job is submitted to, as archiving does not require a large amount of resources and the short and serial queues on ARCHER2 do not use your project allowance. This would be done using the xmlchange script almost the same as for the case.run job. Note that the main job and the archiving job share some parameter names such as JOB_QUEUE, and so a flag (--subgroup) specifying which you want to change should be used, as below:

./xmlchange JOB_QUEUE=short --subgroup case.st_archive\n

If the --subgroup flag is not used, then the JOB_QUEUE value for both the case.run and case.st_archive jobs will be changed. You can verify that they are different by running

./xmlquery JOB_QUEUE\n

which will show the value of this parameter for both jobs.

The archive is set up to move .nc files and logs from $CESM_ROOT/runs/$CASE to $CESM_ROOT/archive/$CASE. As such, your /work storage quota is being used whether archiving is switched on or off, and so it would be recommended that data you wish to retain be moved to another service such as a group workspace on JASMIN. See the Data Management and Transfer guide for more information on archiving data from ARCHER2. If you want to archive your files directly to a different location than the default, this can be set using the $DOUT_S_ROOT parameter.

"},{"location":"research-software/cesm213_run/#troubleshooting","title":"Troubleshooting","text":"

If a run fails, the first place to check is the run submission output file, usually located at

$CASEROOT/run.$CASE\n

so, for the example job run in this guide, the output file will be at

$CESM_ROOT/runs/b.e20.B1850.f19_g17.test/run.b.e20.B1850.f19_g17.test\n

If any errors have occurred, the location of the relevant log in which you can examine this error will be printed towards the end of this output file. The log will usually be located at

$CASEROOT/run/cesm.log.*\n

so in this case, the path would be

$CESM_ROOT/runs/b.e20.B1850.f19_g17.test/run/cesm.log.*\n
"},{"location":"research-software/cesm213_run/#known-issues-and-common-problems","title":"Known Issues and Common Problems","text":""},{"location":"research-software/cesm213_run/#input-data-errors","title":"Input data errors","text":"

Occasionally, the input data for a case is not downloaded correctly. Unfortunately, in these cases the checksum test run by the check_input_data script will not catch the corrupted fields in the file. The error message displayed can vary somewhat, but a common error message is

ERROR timeaddmonths(): MM out of range\"\n

You can often spot these errors by examining the log as described above, as the error will occur shortly after a file has been read. If this happens, delete the file in question from your cesm_inputdata directory and rerun

./check_input_data --download\n
to ensure that the data is downloaded correctly.

"},{"location":"research-software/cesm213_run/#sigfpe-errors","title":"SIGFPE errors","text":"

If running a case with the DEBUG flag enabled, you may see some SIGFPE errors. In this case, the traceback shown in the logs will show the error as originating in one of three places:

This problem is caused by 'short-circuit' logic in the affected files, where there may be a conditional of the form

if (A .and. B) then....\n
where B cannot be properly evaluated if A fails, for example

if ( x /= 0 .and. y/x > c ) then....\n
which would result in a divide-by-zero error if the second condition was evaluated after the first condition had already failed.

In standard simulations, the second condition would be skipped in these cases however if the user has set

./xmlchange DEBUG=TRUE\n

then the second condition will not be skipped and a SIGFPE error will occur.

If encountering these errors, a user can do one of two things. The simplest solution is to turn off the DEBUG flag with

./xmlchange DEBUG=TRUE\n
If this option is not possible however, and your simulation absolutely needs to be run in DEBUG mode, then the conditional can be modified in the program code. THIS IS DONE AT YOUR OWN RISK!!! The fix that has been applied for the WW3 component can be seen here. It is recommended that if you are making any changes to the code for this reason, that you revert your changes back once you no longer need to run your case in DEBUG mode.

"},{"location":"research-software/cesm213_run/#sigsegv-errors","title":"SIGSEGV errors","text":"

Sometimes an error will occur where a run is ended prematurely and gives an error of the form

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.\n

This can often be solved by increasing the amount of available memory per task, either by changing the maximum number of MPI tasks per node by using

./xmlchange MAX_TASKS_PER_NODE=64\n

or by increasing the number of threads used by using

./xmlchange NTHRDS=2\n

This will double the amount of memory available for each physical core

"},{"location":"research-software/cesm213_run/#archiving-errors","title":"Archiving Errors","text":"

When running WACCM-X cases (compsets starting FX*), there can sometimes be problems when running restart jobs. This is caused by the short-term archiving job mistakenly moving files needed for restarts to the archive. To ensure this does not happen, it can be a good idea when running WACCM-X simulations to turn off the short-term archiver using

./xmlchange DOUT_S=FALSE\n

While this behaviour has so far only been observed for WACCM-X jobs, it is possible that this behaviour can occur with other compsets

"},{"location":"research-software/cesm213_run/#job-failing-instantly-with-undefined-environment-variable","title":"Job Failing instantly with undefined environment variable","text":"

There is a small possibility that your job may initially fail with the error message

ERROR: Undefined env var 'CESM_ROOT'\n
This could have two causes: 1. You do not have the CESM2/2.1.3 module loaded. This module needs to be loaded when running the case as well as when building the case. Try running again after having run module load CESM2/2.1.3 2. This could also be due to a known issue with ARCHER2 where adding the SBATCH directive export=ALL to a slurm script will not work (see the ARCHER2 known issues entry on the subject). The ARCHER2 configuration included in the version of cime that was downloaded during setup should apply a work-around to this, and so you should not see this error in this case. It may still occur in some corner cases however. To avoid this, ensure that the environment from which you are submitting your case has the CESM2/2.1.3 module loaded and run the case.submit script with the following command
./case.submit -a=--export=ALL\n

"},{"location":"research-software/cesm213_setup/","title":"First-Time setup of CESM 2.1.3","text":"

Important

These instructions are intended for users of the n02 project. Downloads may be incomplete if you are not a member of n02.

Due to the nature of the CESM program, a centrally installed version of the code is not provided on ARCHER2. Instead, a user needs to download and set up the program themselves in their /work area. The installation is done in three steps:

  1. Download the code and set up the directory structure
  2. Link and Download Components
  3. Build CPRNC

After setup, CESM is ready to run a simple case.

"},{"location":"research-software/cesm213_setup/#downloading-cesm-213-and-setting-up-the-directory-structure","title":"Downloading CESM 2.1.3 And Setting Up The Directory Structure","text":"

For ease of use, a setup script has been created which downloads CESM 2.1.3, creates the directory structure needed for running CESM2 cases and creates a hidden file in your home directory containing environment variables needed by CESM.

To execute this script, run the following in an archer2 terminal

module load cray-python\nsource /work/n02/shared/CESM2/setup_cesm213.sh\n

This script will create a directory, defaulting to /work/$GROUP/$GROUP/$USER/cesm/CESM2.1.3, where $GROUP is your default group, for example n02, and populate it with the following subdirectories: * archive - short-term archiving for completed runs, * ccsm_baselines - baseline files, * cesm_inputdata - input data downloaded and used when running cases, * runs - location of the case files used when running a case, * cesm directory - location of the cesm source code and the various components. Defaults to my_cesm_sandbox

The default locations for the CESM root directory and the CESM location can be overridden during installation either by entering new paths at runtime when prompted or by providing them as command line arguments, for example

source /work/n02/shared/CESM2/setup_cesm213.sh -p /work/n03/n03/$USER/CESM213 -l cesm_prog\n
"},{"location":"research-software/cesm213_setup/#manual-setup-instructions","title":"Manual setup instructions","text":"

If you have trouble with running the setup script, you can install manually by running the following commands:

PREFIX=\"path/to/your/desired/cesm/root/location\"\nCESM_DIR_LOC=\"name_of_install_directory_for_cesm\"\n\nmkdir -p $PREFIX\ncd $PREFIX\nmkdir -p archive\nmkdir -p ccsm_baselines\nmkdir -p cesm_inputdata\nmkdir -p runs\n\nCESM_LOC=$PREFIX/$CESM_DIR_LOC\n\ngit clone -b release-cesm2.1.3  https://github.com/ESCOMP/CESM.git $CESM_LOC\ncd $CESM_LOC\ngit checkout release-cesm2.1.3\n\ntee ${HOME}/.cesm213 <<EOF > /dev/null\n### CESM 2.1.3 on ARCHER2 Path File\n### Do Not Edit This File Unless You Know What You Are Doing\nCIME_MODEL=cesm\nCESM_ROOT=$PREFIX\nCESM_LOC=$PREFIX/$CESM_DIR_LOC\nCIMEROOT=$PREFIX/$CESM_DIR_LOC/cime\nEOF\n\necho \"module use /work/n02/shared/CESM2/module\" >> ~/.bashrc\nmodule use /work/n02/shared/CESM2/module\nmodule load CESM2/2.1.3\n
"},{"location":"research-software/cesm213_setup/#linking-and-downloading-components","title":"Linking And Downloading Components","text":"

CESM utilises multiple components, including CAM (atmosphere), CICE (sea ice), CISM (ice sheets), CTSM (land), MOSART (adaptive river transport), POP2 (ocean), RTM (river transport) and WW3 (waves), all of which are connected using the Common Infrastructure for Modelling the Earth (CIME). These components are hosted on github, and during the setup process they are downloaded.

Before downloading the external components, you must first modify the file $CESM_LOC/Externals.cfg. This will change the version of CIME from the default cime 5.6.32 to the maintained cime 5.6 branch. This is done by modifying the file so that the cime section goes from

[cime]\ntag = cime5.6.32\nprotocol = git\nrepo_url = https://github.com/ESMCI/cime\nlocal_path = cime\nrequired = True\n

to

[cime]\nbranch = maint-5.6\nprotocol = git\nrepo_url = https://github.com/ESMCI/cime\nlocal_path = cime\nexternals = Externals_cime.cfg\nrequired = True\n

In the same $CESM_LOC/Externals.cfg file, also update the version of CAM:

[cam]\ntag = cam_cesm2_1_rel_41\nprotocol = git\nrepo_url = https://github.com/ESCOMP/CAM\nlocal_path = components/cam\nexternals = Externals_CAM.cfg\nrequired = True\n

to

[cam]\ntag = cam_cesm2_1_rel\nprotocol = git\nrepo_url = https://github.com/ESCOMP/CAM\nlocal_path = components/cam\nexternals = Externals_CAM.cfg\nrequired = True\n

By making these changes, the configurations for archer2 are brought in along with some bug fixes

Once this has been done you are free to download the external components by executing the commands

cd $CESM_LOC\n./manage_externals/checkout_externals\n

The first time you run the checkout_externals script, you may be asked to accept a certificate, and you may also get an error of the form

    svn: E120108: Error running context: The server unexpectedly closed the connection.\n
If this happens, rerun the checkout_externals script and it should download the external components correctly.

"},{"location":"research-software/cesm213_setup/#building-cprnc","title":"Building cprnc","text":"

cprnc is a generic tool for analyzing a netcdf file or comparing two netcdf files. It is used in various places by CESM and the source is included with cime.

To build, execute the following commands

module load CESM2/2.1.3\ncd $CIMEROOT/tools/cprnc\n../configure --macros-format=Makefile --mpilib=mpi-serial\nsed -i '/}}/d' .env_mach_specific.sh\nsource ./.env_mach_specific.sh \nmake\n

It is likely you will see a warning message of the form

The following dependent module(s) are not currently loaded: cray-hdf5-parallel (required by: CESM2/2.1.3), cray-netcdf-hdf5parallel (required by: CESM2/2.1.3), cray-parallel-netcdf (required by: CESM2/2.1.3)\n

This is due to serial netCDF and hdf5 libraries being loaded as a result of the --mpilib=mpi-serial flag. This warning message is safe to ignore.

In a small number of cases you may also see a warning of the form

-bash: export: '}}': not a valid identifier\n

This warning should also be safe to ignore, but can be solved by opening the file ./.env_mach_specific.sh in a text editor and commenting out or deleting the line

export OMP_NUM_THREADS={{ thread_count }}\n

Then rerunning the command

source ./.env_mach_specific.sh && make\n

Once this step has been completed, you are ready to run a simple test case.

"},{"location":"research-software/chemshell/","title":"ChemShell","text":"

ChemShell is a script-based chemistry code focusing on hybrid QM/MM calculations with support for standard quantum chemical or force field calculations. There are two versions: an older Tcl-based version Tcl-ChemShell and a more recent python-based version Py-ChemShell.

The advice from https://www.chemshell.org/licence on the difference is:

We consider Py-ChemShell 23.0 to be suitable for production calculations on both materials systems and biomolecules, and recommend that new ChemShell users should use the Python-based version.

We continue to maintain the original Tcl-based version of ChemShell and distribute it on request. Tcl-ChemShell currently contains some features that are not yet available in Py-ChemShell (but will be soon!) including a QM/MM MD driver and multiple electronic state calculations. At the present time if you need this functionality you will need to obtain a licence for Tcl-Chemshell.

"},{"location":"research-software/chemshell/#useful-links","title":"Useful Links","text":""},{"location":"research-software/chemshell/#using-py-chemshell-on-archer2","title":"Using Py-ChemShell on ARCHER2","text":"

The python-based version of ChemShell is open-source and is freely available to all users on ARCHER2. The version of Py-ChemShell pre-installed on ARCHER2 is compiled with NWChem and GULP as libraries.

Warning

Py-ChemShell on ARCHER2 is compiled with GULP 6.0. This is a licenced software that is free to use for academics. If you are not an academic user (or if you are using Py-ChemShell for non-academic work), please ensure that you have the correct GULP licence before using GULP functionalities in py-ChemShell or make sure that you are not using any of the GULP functionalities in your code (i.e., do not set theory=GULP in your calculations).

"},{"location":"research-software/chemshell/#running-parallel-py-chemshell-jobs","title":"Running parallel Py-ChemShell jobs","text":"

Unlike most other ARCHER2 software packages, the Py-ChemShell module is built in such a way as to enable users to create and submit jobs to the compute nodes by running a chemsh script from the login node rather than by creating and submitting a Slurm submission script. Below is an example command for submitting a pure MPI Py-ChemShell job running on 8 nodes (128x8 cores) with the chemsh command:

    # Run this from the login node\n    module load py-chemshell\n\n    # Replace [budget code] below with your project code (e.g. t01)\n    chemsh --submit               \\\n           --jobname pychmsh      \\\n           --account [budget code] \\\n           --partition standard   \\\n           --qos standard         \\\n           --walltime 0:10:0      \\\n           --nnodes 8             \\\n           --nprocs 1024          \\ \n           py-chemshell-job.py\n
"},{"location":"research-software/chemshell/#using-tcl-chemshell-on-archer2","title":"Using Tcl-ChemShell on ARCHER2","text":"

The older version of Tcl-based ChemShell requires a license. Users with a valid license should request access via the ARCHER2 SAFE.

"},{"location":"research-software/chemshell/#running-parallel-tcl-chemshell-jobs","title":"Running parallel Tcl-ChemShell jobs","text":"

The following script will run a pure MPI Tcl-based ChemShell job using 8 nodes (128x8 cores).

#!/bin/bash\n\n#SBATCH --job-name=lammps_test\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load tcl-chemshell/3.7.1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread chemsh.x input.chm\n
"},{"location":"research-software/code-saturne/","title":"Code_Saturne","text":"

Code_Saturne solves the Navier-Stokes equations for 2D, 2D-axisymmetric and 3D flows, steady or unsteady, laminar or turbulent, incompressible or weakly dilatable, isothermal or not, with scalar transport if required. Several turbulence models are available, from Reynolds-averaged models to large-eddy simulation (LES) models. In addition, a number of specific physical models are also available as \"modules\": gas, coal and heavy-fuel oil combustion, semi-transparent radiative transfer, particle-tracking with Lagrangian modeling, Joule effect, electrics arcs, weakly compressible flows, atmospheric flows, rotor/stator interaction for hydraulic machines.

"},{"location":"research-software/code-saturne/#useful-links","title":"Useful Links","text":""},{"location":"research-software/code-saturne/#using-code_saturne-on-archer2","title":"Using Code_Saturne on ARCHER2","text":"

Code_Saturne is released under the GNU General Public Licence v2 and so is freely available to all users on ARCHER2.

You can load the default GCC build of Code_Saturne for use by running the following command:

module load code_saturne\n

This will load the default code_saturne/7.0.1-gcc11 module. A build using the CCE compilers, code_saturne/7.0.1-cce12, has also been made optionally available to users on the full ARCHER2 system as testing indicates that this may provide improved performance over the GCC build.

"},{"location":"research-software/code-saturne/#running-parallel-code_saturne-jobs","title":"Running parallel Code_Saturne jobs","text":"

After setting up a case it should be initialized by running the following command from the case directory, where setup.xml is the input file:

code_saturne run --initialize --param setup.xml\n

This will create a directory named for the current date and time (e.g. 20201019-1636) inside the RESU directory. Inside the new directory will be a script named run_solver. You may alter this to resemble the script below, or you may wish to simply create a new one with the contents shown.

If you wish to alter the existing run_solver script you will need to add all the #SBATCH options shown to set the job name, size and so on. You should also add the two module commands, and srun --distribution=block:block --hint=nomultithread as well as the --mpi option to the line executing ./cs_solver to ensure parallel execution on the compute nodes. The export LD_LIBRARY_PATH=... and cd commands are redundant and may be retained or removed.

This script will run an MPI-only Code_Saturne job using the default GCC build and UCX over 4 nodes (128 x 4 = 512 cores) for a maximum of 20 minutes.

#!/bin/bash\n#SBATCH --export=none\n#SBATCH --job-name=CSExample\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the GCC build of Code_Saturne 7.0.1\nmodule load cpe/21.09\nmodule load PrgEnv-gnu\nmodule load code_saturne\n\n# Switch to mpich-ucx implementation (see info note below)\nmodule swap craype-network-ofi craype-network-ucx\nmodule swap cray-mpich cray-mpich-ucx\n\n# Prevent threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Run solver.\nsrun --distribution=block:block --hint=nomultithread ./cs_solver --mpi $@\n

The script can then be submitted to the batch system with sbatch.

Info

There is a known issue with the default MPI collectives which is causing performance issues on Code_Saturne. The suggested workaround is to switch to the mpich-ucx implementation. For this to link correctly on the full system, the extra cpe/21.09 and PrgEnv-gnu modules also have to be explicitly loaded.

"},{"location":"research-software/code-saturne/#compiling-code_saturne","title":"Compiling Code_Saturne","text":"

The latest instructions for building Code_Saturne on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/cp2k/","title":"CP2K","text":"

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modelling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO), and classical force fields (AMBER, CHARMM). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.

"},{"location":"research-software/cp2k/#useful-links","title":"Useful links","text":""},{"location":"research-software/cp2k/#using-cp2k-on-archer2","title":"Using CP2K on ARCHER2","text":"

CP2K is available through the cp2k module. MPI only cp2k.popt and MPI/OpenMP Hybrid cp2k.psmp binaries are available.

For ARCHER2, CP2K has been compiled with the following optional features: FFTW for fast Fourier transforms, libint to enable methods including Hartree-Fock exchange, libxc to provide a wider choice of exchange-correlation functionals, ELPA for improved performance of matrix diagonalisation, PLUMED to allow enhanced sampling methods.

See CP2K compile instructions for a full list of optional features.

If there is an optional feature not available, and which you would like, please contact the Service Desk. Experts may also wish to compile their own versions of the code (see below for instructions).

"},{"location":"research-software/cp2k/#running-parallel-cp2k-jobs","title":"Running parallel CP2K jobs","text":""},{"location":"research-software/cp2k/#mpi-only-jobs","title":"MPI only jobs","text":"

To run CP2K using MPI only, load the cp2k module and use the cp2k.psmp executable.

For example, the following script will run a CP2K job using 4 nodes (128x4 cores):

#!/bin/bash\n\n# Request 4 nodes using 128 cores per node for 128 MPI tasks per node.\n\n#SBATCH --job-name=CP2K_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevent CP2K module\nmodule load cp2k\n\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block cp2k.psmp -i MYINPUT.inp\n
"},{"location":"research-software/cp2k/#mpiopenmp-hybrid-jobs","title":"MPI/OpenMP hybrid jobs","text":"

To run CP2K using MPI and OpenMP, load the cp2k module and use the cp2k.psmp executable.

#!/bin/bash\n\n# Request 4 nodes with 16 MPI tasks per node each using 8 threads;\n# note this means 128 MPI tasks in total.\n# Remember to replace [budget code] below with your account code,\n# e.g. '--account=t01'.\n\n#SBATCH --job-name=CP2K_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=8\n#SBATCH --time=00:20:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant CP2K module\nmodule load cp2k\n\n# Ensure OMP_NUM_THREADS is consistent with cpus-per-task above\nexport OMP_NUM_THREADS=8\nexport OMP_PLACES=cores\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block cp2k.psmp -i MYINPUT.inp\n
"},{"location":"research-software/cp2k/#compiling-cp2k","title":"Compiling CP2K","text":"

The latest instructions for building CP2K on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/crystal/","title":"CRYSTAL","text":"

CRYSTAL is a general-purpose program for the study of crystalline solids. The CRYSTAL program computes the electronic structure of periodic systems within Hartree Fock, density functional or various hybrid approximations (global, range-separated and double-hybrids). The Bloch functions of the periodic systems are expanded as linear combinations of atom centred Gaussian functions. Powerful screening techniques are used to exploit real space locality. Restricted (Closed Shell) and Unrestricted (Spin-polarized) calculations can be performed with all-electron and valence-only basis sets with effective core pseudo-potentials. The current release is CRYSTAL23.

Important

CRYSTAL is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/crystal/#useful-links","title":"Useful Links","text":""},{"location":"research-software/crystal/#using-crystal-on-archer2","title":"Using CRYSTAL on ARCHER2","text":"

CRYSTAL is only available to users who have a valid CRYSTAL license. You request access through SAFE:

Please have your license details to hand.

"},{"location":"research-software/crystal/#running-parallel-crystal-jobs","title":"Running parallel CRYSTAL jobs","text":"

The following script will run CRYSTAL using pure MPI for parallelisation using 256 MPI processes, 1 per core across 2 nodes. It assumes that the input file is tio2.d12

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --time=0:20:00\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load crystal/23-1.0.1-2\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Change this to the name of your input file\ncp tio2.d12 INPUT\n\nsrun --hint=nomultithread --distribution=block:block MPPcrystal\n

An equivalent 2 node job using MPI+OpenMP parallelism with 4 threads per MPI process, 64 MPI processes, 1 thread per core across 2 nodes would be:

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --time=0:20:00\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=4\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load crystal/23-1.0.1-2\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Change this to the name of your input file\ncp tio2.d12 INPUT\n\nexport OMP_NUM_THREADS=4\nexport OMP_PLACES=cores\nexport OMP_STACKSIZE=16M\n\nsrun --hint=nomultithread --distribution=block:block MPPcrystalOMP\n
"},{"location":"research-software/crystal/#tips-and-known-issues","title":"Tips and known issues","text":""},{"location":"research-software/crystal/#cpu-frequency","title":"CPU frequency","text":"

You should run some short (1 or 2 SCF cycles) jobs to test the scaling of your job so you can decide on the balance between cost to your budget and the time it takes to get a result. You now should include a few tests at different clock rates as part of this process.

Based on a few simple tests we have run it is likely that jobs dominated by building the Kohn-Sham matrix (SHELLX+MONMO3+NUMDFT in the output) will see minimal energy savings and better performance at 2.25GHz. Jobs dominated by the ScaLapack calls (MPP_DIAG in the output) may show useful energy savings at 2.0GHz.

"},{"location":"research-software/crystal/#out-of-memory-errors","title":"Out-of-memory errors","text":"

Long-running jobs may encounter unexpected errors of the form

slurmstepd: error: Detected 1 oom-kill event(s) in step 411502.0 cgroup.\n
These are related to a memory leak in the underlying libfabric communication layer, which will be fixed in a future release. In the meantime, it should be possible to work around the problem by adding
export FI_MR_CACHE_MAX_COUNT=0 \n
to the SLURM submission script.

"},{"location":"research-software/fhi-aims/","title":"FHI-aims","text":"

FHI-aims is an all-electron electronic structure code based on numeric atom-centered orbitals. It enables first-principles simulations with very high numerical accuracy for production calculations, with excellent scalability up to very large system sizes (thousands of atoms) and up to very large, massively parallel supercomputers (ten thousand CPU cores).

"},{"location":"research-software/fhi-aims/#useful-links","title":"Useful Links","text":""},{"location":"research-software/fhi-aims/#using-fhi-aims-on-archer2","title":"Using FHI-aims on ARCHER2","text":"

FHI-aims is only available to users who have a valid FHI-aims licence.

If you have a FHI-aims licence and wish to have access to FHI-aims on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/fhi-aims/#running-parallel-fhi-aims-jobs","title":"Running parallel FHI-aims jobs","text":"

The following script will run a FHI-aims job using 8 nodes (1024 cores). The script assumes that the input have the default names control.in and geometry.in.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n#SBATCH --job-name=FHI-aims\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the FHI-aims module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load fhiaims\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=1\nsrun --distribution=block:block --hint=nomultithread aims.mpi.x\n
"},{"location":"research-software/fhi-aims/#compiling-fhi-aims","title":"Compiling FHI-aims","text":"

The latest instructions for building FHI-aims on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/gromacs/","title":"GROMACS","text":"

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

"},{"location":"research-software/gromacs/#useful-links","title":"Useful Links","text":""},{"location":"research-software/gromacs/#using-gromacs-on-archer2","title":"Using GROMACS on ARCHER2","text":"

GROMACS is Open Source software and is freely available to all users. Three executable versions are available on the normal (CPU-only) modules:

We also provide a GPU version of GROMACS that will run on the MI210 GPU nodes, it's named gromacs/2022.4-GPU and can be loaded with

module load gromacs/2022.4-GPU\n

Important

The gromacs modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/gromacs/#running-parallel-gromacs-jobs","title":"Running parallel GROMACS jobs","text":""},{"location":"research-software/gromacs/#running-mpi-only-jobs","title":"Running MPI only jobs","text":"

The following script will run a GROMACS MD job using 4 nodes (128x4 cores) with pure MPI.

#!/bin/bash\n\n#SBATCH --job-name=mdrun_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Setup the environment\nmodule load gromacs\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=1 \nsrun --distribution=block:block --hint=nomultithread gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"research-software/gromacs/#running-hybrid-mpiopenmp-jobs","title":"Running hybrid MPI/OpenMP jobs","text":"

The following script will run a GROMACS MD job using 4 nodes (128x4 cores) with 6 MPI processes per node (24 MPI processes in total) and 6 OpenMP threads per MPI process.

#!/bin/bash\n#SBATCH --job-name=mdrun_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=8\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Setup the environment\nmodule load gromacs\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport OMP_NUM_THREADS=8\nsrun --distribution=block:block --hint=nomultithread gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"research-software/gromacs/#running-gromacs-on-the-amd-mi210-gpus","title":"Running GROMACS on the AMD MI210 GPUs","text":"

The following script will run a GROMACS MD job using 1 GPU with 1 MPI process 8 OpenMP threads per MPI process.

#!/bin/bash\n#SBATCH --job-name=mdrun_gpu\n#SBATCH --gpus=1\n#SBATCH --time=00:20:00\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd  # or gpu-exc\n\n# Setup the environment\nmodule load gromacs/2022.4-GPU\n\nexport OMP_NUM_THREADS=8\nsrun --ntasks=1 --cpus-per-task=8 gmx_mpi mdrun -ntomp 8 --noconfout -s calc.tpr\n
"},{"location":"research-software/gromacs/#compiling-gromacs","title":"Compiling Gromacs","text":"

The latest instructions for building GROMACS on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/lammps/","title":"LAMMPS","text":"

LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is a classical molecular dynamics code. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers), and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, mesoscopic, or continuum scale.

"},{"location":"research-software/lammps/#useful-links","title":"Useful Links","text":""},{"location":"research-software/lammps/#using-lammps-on-archer2","title":"Using LAMMPS on ARCHER2","text":"

LAMMPS is freely available to all ARCHER2 users.

The centrally installed version of LAMMPS is compiled with all the standard packages included: ASPHERE, BODY, CLASS2, COLLOID, COMPRESS, CORESHELL, DIPOLE, GRANULAR, KSPACE, MANYBODY, MC, MISC, MOLECULE, OPT, PERI, QEQ, REPLICA, RIGID, SHOCK, SNAP, SRD.

We do not install any USER packages. If you are interested in a USER package, we would encourage you to try to compile your own version and we can help out if necessary (see below).

Important

The lammps modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/lammps/#running-parallel-lammps-jobs","title":"Running parallel LAMMPS jobs","text":"

LAMMPS can exploit multiple nodes on ARCHER2 and will generally be run in exclusive mode using more than one node.

For example, the following script will run a LAMMPS MD job using 4 nodes (128x4 cores) with MPI only.

#!/bin/bash\n\n#SBATCH --job-name=lammps_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load lammps\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread lmp -i in.test -l out.test\n
"},{"location":"research-software/lammps/#compiling-lammps","title":"Compiling LAMMPS","text":"

The large range of optional packages available for LAMMPS, and opportunity for extensibility, may mean that it is convenient for users to compile their own copy. In practice, LAMMPS is relatively easy to compile, so we encourage users to have a go.

Compilation instructions for LAMMPS on ARCHER2 can be found on GitHub:

"},{"location":"research-software/mitgcm/","title":"MITgcm","text":"

The Massachusetts Institute of Technology General Circulation Model (MITgcm) is a numerical model designed for study of the atmosphere, ocean, and climate. MITgcm's flexible non-hydrostatic formulation enables it to simulate fluid phenomena over a wide range of scales; its adjoint capabilities enable it to be applied to sensitivity questions and to parameter and state estimation problems. By employing fluid equation isomorphisms, a single dynamical kernel can be used to simulate flow of both the atmosphere and ocean.

"},{"location":"research-software/mitgcm/#useful-links","title":"Useful Links","text":""},{"location":"research-software/mitgcm/#building-mitgcm-on-archer2","title":"Building MITgcm on ARCHER2","text":"

MITgcm is not available via a module on ARCHER2 as users will build their own executables specific to the problem they are working on.

You can obtain the MITgcm source code from the developers by cloning from the GitHub repository with the command

git clone https://github.com/MITgcm/MITgcm.git\n

You should then copy the ARCHER2 optfile into the MITgcm directories.

Warning

A current ARCHER2 optfile is not available at the present time. Please contact support@archer2.ac.uk for help.

You should also set the following environment variables. MITGCM_ROOTDIR is used to locate the source code and should point to the top MITgcm directory. Optionally, adding the MITgcm tools directory to your PATH environment variable makes it easier to use tools such as genmake2, and the MITGCM_OPT environment variable makes it easier to refer to pass the optfile to genmake2.

export MITGCM_ROOTDIR=/path/to/MITgcm\nexport PATH=$MITGCM_ROOTDIR/tools:$PATH\nexport MITGCM_OPT=$MITGCM_ROOTDIR/tools/build_options/dev_linux_amd64_cray_archer2\n

When using genmake2 to create the Makefile, you will need to specify the optfile to use. Other commonly used options might be to use extra source code with the -mods option, to enable MPI with -mpi, and to enable OpenMP with -omp. You might then run a command that resembles the following:

genmake2 -mods /path/to/additional/source -mpi -optfile $MITGCM_OPT\n

You can read about the full set of options available to genmake2 by running

genmake2 -help\n

Finally, you may then build your executable by running

make depend\nmake\n
"},{"location":"research-software/mitgcm/#running-mitgcm-on-archer2","title":"Running MITgcm on ARCHER2","text":""},{"location":"research-software/mitgcm/#pure-mpi","title":"Pure MPI","text":"

Once you have built your executable you can write a script like the following which will allow it to run on the ARCHER2 compute nodes. This example would run a pure MPI MITgcm simulation over 2 nodes of 128 cores each for up to one hour.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MITgcm-simulation\n#SBATCH --time=1:0:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n
"},{"location":"research-software/mitgcm/#hybrid-openmp-mpi","title":"Hybrid OpenMP & MPI","text":"

Warning

Running the model in hybrid mode may lead to performance decreases as well as increases. You should be sure to profile your code both as a pure MPI application and as a hybrid OpenMP-MPI application to ensure you are making efficient use of resources. Be sure to read both the Archer2 advice on OpenMP and the MITgcm documentation first.

Note

Early versions of the ARCHER2 MITgcm optfile do not contain an OMPFLAG. Please ensure you have an up to date copy of the optfile before attempting to compile OpenMP enabled codes.

Depending upon your model setup, you may wish to run the MITgcm code as a hybrid OpenMP-MPI application. In terms of compiling the model, this is as simple as using the flag -omp when calling genmake2, and updating your SIZE.h file to have multiple tiles per process.

The model can be run using a slurm job submission script similar to that shown below. This example will run MITgcm across 2 nodes, with each node using 16 MPI processes, and each process using 4 threads. Note that this would underpopulate the nodes \u2014 i.e. we will only be using 128 of the 256 cores available to us. This can also sometimes lead to performance increases.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MITgcm-hybrid-simulation\n#SBATCH --time=1:0:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=16\n#SBATCH --cpus-per-task=4\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=4  # Set to number of threads per process\nexport OMP_PLACES=\"cores(128)\"  # Set to total number of threads\nexport OMP_PROC_BIND=true  # Required if we want to underpopulate nodes\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n

One final note, is that you should remember to update the eedata file in the model's run directory to ensure the number of threads requested there match those requested in the job submission script.

"},{"location":"research-software/mitgcm/#reproducing-the-ecco-version-4-release-4-state-estimate-on-archer2","title":"Reproducing the ECCO version 4 (release 4) state estimate on ARCHER2","text":"

The ECCO version 4 state estimate (ECCOv4-r4) is an observationally-constrained numerical solution produced by the ECCO group at JPL. If you would like to reproduce the state estimate on ARCHER2 in order to create customised runs and experiments, follow the instructions below. They have been slightly modified from the JPL instructions for ARCHER2.

For more information, see the ECCOv4-r4 website https://ecco-group.org/products-ECCO-V4r4.htm

"},{"location":"research-software/mitgcm/#get-the-eccov4-r4-source-code","title":"Get the ECCOv4-r4 source code","text":"

First, navigate to your directory on the /work filesystem in order to get access to the compute nodes. Next, create a working directory, perhaps MYECCO, and navigate into this working directory:

mkdir MYECCO\ncd MYECCO\n

In order to reproduce ECCOv4-r4, we need a specific checkpoint of the MITgcm source code.

git clone https://github.com/MITgcm/MITgcm.git -b checkpoint66g\n

Next, get the ECCOv4-r4 specific code from GitHub:

cd MITgcm\nmkdir -p ECCOV4/release4\ncd ECCOV4/release4\ngit clone https://github.com/ECCO-GROUP/ECCO-v4-Configurations.git\nmv ECCO-v4-Configurations/ECCOv4\\ Release\\ 4/code .\nrm -rf ECCO-v4-Configurations\n
"},{"location":"research-software/mitgcm/#get-the-eccov4-r4-forcing-files","title":"Get the ECCOv4-r4 forcing files","text":"

The surface forcing and other input files that are too large to be stored on GitHub are available via NASA data servers. In total, these files are about 200 GB in size. You must register for an Earthdata account and connect to a WebDAV server in order to access these files. For more detailed instructions, read the help page https://ecco.jpl.nasa.gov/drive/help.

First, apply for an Earthdata account: https://urs.earthdata.nasa.gov/users/new

Next, acquire your WebDAV credentials: https://ecco.jpl.nasa.gov/drive (second box from the top)

Now, you can use wget to download the required forcing and input files:

wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_forcing\nwget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init\nwget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_ecco\n

After using wget, you will notice that the input* directories are, by default, several levels deep in the directory structure. Use the mv command to move the input* directories to the directory where you executed the wget command. Specifically,

mv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_forcing/ .\nmv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init/ .\nmv ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_ecco/ .\nrm -rf ecco.jpl.nasa.gov\n
"},{"location":"research-software/mitgcm/#compiling-and-running-eccov4-r4","title":"Compiling and running ECCOv4-r4","text":"

The steps for building the ECCOv4-r4 instance of MITgcm are very similar to those for other build cases. First, wou will need to create a build directory:

cd MITgcm/ECCOV4/release4\nmkdir build\ncd build\n

Load the NetCDF modules:

module load cray-hdf5\nmodule load cray-netcdf\n

If you haven't already, set your environment variables:

export MITGCM_ROOTDIR=../../../../MITgcm\nexport PATH=$MITGCM_ROOTDIR/tools:$PATH\nexport MITGCM_OPT=$MITGCM_ROOTDIR/tools/build_options/dev_linux_amd64_cray_archer2\n

Next, compile the executable:

genmake2 -mods ../code -mpi -optfile $MITGCM_OPT\nmake depend\nmake\n

Once you have compiled the model, you will have the mitgcmuv executable for ECCOv4-r4.

"},{"location":"research-software/mitgcm/#create-run-directory-and-link-files","title":"Create run directory and link files","text":"

In order to run the model, you need to create a run directory and link/copy the appropriate files. First, navigate to your directory on the work filesystem. From the MITgcm/ECCOV4/release4 directory:

mkdir run\ncd run\n\n# link the data files\nln -s ../input_init/NAMELIST/* .\nln -s ../input_init/error_weight/ctrl_weight/* .\nln -s ../input_init/error_weight/data_error/* .\nln -s ../input_init/* .\nln -s ../input_init/tools/* .\nln -s ../input_ecco/*/* .\nln -s ../input_forcing/eccov4r4* .\n\npython mkdir_subdir_diags.py\n\n# manually copy the mitgcmuv executable\ncp -p ../build/mitgcmuv .\n

For a short test run, edit the nTimeSteps variable in the file data. Comment out the default value and uncomment the line reading nTimeSteps=8. This is a useful test to make sure that the model can at least start up.

To run on ARCHER2, submit a batch script to the Slurm scheduler. Here is an example submission script:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=ECCOv4r4-test\n#SBATCH --time=1:0:0\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=12\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# For adjoint runs the default cpu-freq is a lot slower\n#SBATCH --cpu-freq=2250000\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 256 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\nsrun --distribution=block:block --hint=nomultithread ./mitgcmuv\n

This configuration uses 96 MPI processes at 12 MPI processes per node. Once the run has finished, in order to check that the run has successfully completed, check the end of one of the standard output files.

tail STDOUT.0000\n

It should read

PROGRAM MAIN: Execution ended Normally\n

The files named STDOUT.* contain diagnostic information that you can use to check your results. As a first pass, check the printed statistics for any clear signs of trouble (e.g. NaN values, extremely large values).

"},{"location":"research-software/mitgcm/#eccov4-r4-in-adjoint-mode","title":"ECCOv4-r4 in adjoint mode","text":"

If you have access to the commercial TAF software produced by http://FastOpt.de, then you can compile and run the ECCOv4-r4 instance of MITgcm in adjoint mode. This mode is useful for comprehensive sensitivity studies and for constructing state estimates. From the MITgcm/ECCOV4/release4 directory, create a new code directory and a new build directory:

mkdir code_ad\ncd code_ad\nln -s ../code/* .\ncd ..\nmkdir build_ad\ncd build_ad\n

In this instance, the code_ad and code directories are identical, although this does not have to be the case. Make sure that you have the staf script in your path or in the build_ad directory itself. To make sure that you have the most up-to-date script, run:

./staf -get staf\n

To test your connection to the FastOpt servers, try:

./staf -test\n

You should receive the following message:

Your access to the TAF server is enabled.\n

The compilation commands are similar to those used to build the forward case.

# load relevant modules\nmodule load cray-netcdf-hdf5parallel\nmodule load cray-hdf5-parallel\n\n# compile adjoint model\n../../../MITgcm/tools/genmake2 -ieee -mpi -mods=../code_ad -of=(PATH_TO_OPTFILE)\nmake depend\nmake adtaf\nmake adall\n

The source code will be packaged and forwarded to the FastOpt servers, where it will undergo source-to-source translation via the TAF algorithmic differentiation software. If the compilation is successful, you will have an executable named mitgcmuv_ad. This will run the ECCOv4-r4 configuration of MITgcm in adjoint mode. As before, create a run directory and copy in the relevant files. The procedure is the same as for the forward model, with the following modifications:

cd ..\nmkdir run_ad\ncd run_ad\n# manually copy the mitgcmuv executable\ncp -p ../build_ad/mitgcmuv_ad .\n

To run the model, change the name of the executable in the Slurm submission script; everything else should be the same as in the forward case. As above, at the end of the run you should have a set of STDOUT.* files that you can examine for any obvious problems.

"},{"location":"research-software/mitgcm/#compile-time-errors","title":"Compile time errors","text":"

If TAF compilation fails with an error like failed to convert GOTPCREL relocation; relink with --no-relax then add the following line to the FFLAGS options: -Wl,--no-relax.

"},{"location":"research-software/mitgcm/#checkpointing-for-adjoint-runs","title":"Checkpointing for adjoint runs","text":"

In an adjoint run, there is a balance between storage (i.e. saving the model state to disk) and recomputation (i.e. integrating the model forward from a stored state). Changing the nchklev parameters in the tamc.h file at compile time is how you control the relative balance between storage and recomputation.

A suggested strategy that has been used on a variety of HPC platforms is as follows: 1. Set nchklev_1 as large as possible, up to the size allowed by memory on your machine. (Use the size command to estimate the memory per process. This should be just a little bit less than the maximum allowed on the machine. On ARCHER2 this is 2 GB (standard) and 4 GB (high memory)). 2. Next, set nchklev_2 and nchklev_3 to be large enough to accommodate the entire run. A common strategy is to set nchklev_2 = nchklev_3 = sqrt(numsteps/nchklev_1) + 1. 3. If the nchklev_2 files get too big, then you may have to add a fourth level (i.e. nchklev_4), but this is unlikely.

This strategy allows you to keep as much in memory as possible, minimising the I/O requirements for the disk. This is useful, as I/O is often the bottleneck for MITgcm runs on HPC.

Another way to adjust performance is to adjust how tapelevel I/O is handled. This strategy performs well for most configurations:

C o tape settings\n#define ALLOW_AUTODIFF_WHTAPEIO\n#define AUTODIFF_USE_OLDSTORE_2D\n#define AUTODIFF_USE_OLDSTORE_3D\n#define EXCLUDE_WHIO_GLOBUFF_2D\n#define ALLOW_INIT_WHTAPEIO\n

"},{"location":"research-software/mo-unified-model/","title":"Met Office Unified Model","text":"

The Met Office Unified Model (\"the UM\") is a numerical model of the atmosphere used for both weather and climate applications. It is often coupled to the NEMO ocean model using the OASIS coupling framework to provide a full Earth system model.

"},{"location":"research-software/mo-unified-model/#useful-links","title":"Useful Links","text":""},{"location":"research-software/mo-unified-model/#using-the-um","title":"Using the UM","text":"

Information on using the UM is provided by the NCAS Computational Modelling Service (CMS).

"},{"location":"research-software/namd/","title":"NAMD","text":"

NAMD is an award-winning parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

"},{"location":"research-software/namd/#useful-links","title":"Useful Links","text":""},{"location":"research-software/namd/#using-namd-on-archer2","title":"Using NAMD on ARCHER2","text":"

NAMD is freely available to all ARCHER2 users.

ARCHER2 has two versions of NAMD available: no-SMP (namd/2.14-nosmp) or SMP (namd/2.14). The SMP (Shared Memory Parallelism) build of NAMD introduces threaded parallelism to address memory limitations. The no-SMP build will typically provide the best performance but most users will require SMP in order to cope with high memory requirements.

Important

The namd modules reset the CPU frequency to the highest possible value (2.25 GHz) as this generally achieves the best balance of performance to energy use. You can change this setting by following the instructions in the Energy use section of the User Guide.

"},{"location":"research-software/namd/#running-mpi-only-namd-jobs","title":"Running MPI only NAMD jobs","text":"

Using no-SMP NAMD will run jobs with only MPI processes and will not introduce additional threaded parallelism. This is the simplest approach to running NAMD jobs and is likely to give the best performance unless simulations are limited by high memory requirements.

The following script will run a pure MPI NAMD MD job using 4 nodes (i.e. 128x4 = 512 MPI parallel processes).

#!/bin/bash\n\n# Request four nodes to run a job of 512 MPI tasks with 128 MPI\n# tasks per node, here for maximum time 20 minutes.\n\n#SBATCH --job-name=namd-nosmp\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14-nosmp\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread namd2 input.namd\n
"},{"location":"research-software/namd/#running-smp-namd-jobs","title":"Running SMP NAMD jobs","text":"

If your jobs runs out of memory, then using the SMP version of NAMD will reduce the memory requirements. This involves launching a combination of MPI processes for communication and worker threads which perform computation.

The following script will run a SMP NAMD MD job using 4 nodes with 8 MPI communication processes per node and 16 worker threads per communication process (i.e. a fully-occupied node with all 512 cores populated with processes).

#!/bin/bash\n#SBATCH --job-name=namd-smp\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=4\n#SBATCH --nodes=4\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant modules\nmodule load namd\n\n# Set procs per node (PPN) & OMP_NUM_THREADS\nexport PPN=$(($SLURM_CPUS_PER_TASK-1))\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport OMP_PLACES=cores\n\n# Record PPN in the output file\necho \"Number of worker threads PPN = $PPN\"\n\n# Run NAMD\nsrun --distribution=block:block --hint=nomultithread namd2 +setcpuaffinity +ppn $PPN input.namd\n

Important

Please do not set SRUN_CPUS_PER_TASK when running the SMP version of NAMD. Otherwise, Charm++ will be unable to pin processes to CPUs, causing NAMD to abort with errors such as Couldn't bind to cpuset 0x00000010,,,0x0: Invalid argument.

How do I choose an optimal choice of MPI processes and worker threads for my simulations? The optimal choice for the numbers of MPI processes and worker threads per node depends on the data set and the number of compute nodes. Before running large production jobs, it is worth experimenting with these parameters to find the optimal configuration for your simulation.

We recommend that users match the ARCHER2 NUMA architecture to find the optimal balance of thread and process parallelism. The NUMA levels on ARCHER2 compute nodes are: 4 cores per CCX, 8 cores per CCD, 16 cores per memory controller, 64 cores per socket. For example, the above submission script specifies 32 MPI communication processes per node and 4 worker threads per communication process which places 1 MPI process per CCX on each node.

Note

To ensure fully occupied nodes with the SMP build of NAMD and match the NUMA layout, the optimal values of (tasks-per-node, cpus-per-task) are likely to be (32,4), (16,8) or (8,16).

How do I choose a value for the +ppn flag? The number of workers per communication process is specified by the +ppn argument to NAMD, which is set here to equal cpus-per-task - 1, to leave a CPU-core free for the associated MPI process.

We recommend that users reserve a thread per process to improve the scalability. Reserving this thread on a many-cores-per-node architecture like ARCHER2 will reduce the communication between threads and improve the scalability.

"},{"location":"research-software/namd/#compiling-namd","title":"Compiling NAMD","text":"

The latest instructions for building NAMD on ARCHER2 may be found in the GitHub repository of build instructions.

ARCHER2 Full System

"},{"location":"research-software/nektarplusplus/","title":"Nektar++","text":"

Nektar++ is a tensor product based finite element package designed to allow one to construct efficient classical low polynomial order h-type solvers (where h is the size of the finite element) as well as higher p-order piecewise polynomial order solvers.

The Nektar++ framework comes with a number of solvers and also allows one to construct a variety of new solvers. Users can therefore use Nektar++ just to run simulations, or to extend and/or develop new functionality.

"},{"location":"research-software/nektarplusplus/#useful-links","title":"Useful Links","text":""},{"location":"research-software/nektarplusplus/#using-nektar-on-archer2","title":"Using Nektar++ on ARCHER2","text":"

Nektar++ is released under an MIT license and is available to all users on the ARCHER2 full system.

"},{"location":"research-software/nektarplusplus/#where-can-i-get-help","title":"Where can I get help?","text":"

Specific issues with Nektar++ itself might be submitted to the issue tracker at the Nektar++ gitlab repository (see link above). More general questions might also be directed to the Nektar-users mailing list. Issues specific to the use or behaviour of Nektar++ on ARCHER2 should be sent to the Service Desk.

"},{"location":"research-software/nektarplusplus/#running-parallel-nektar-jobs","title":"Running parallel Nektar++ jobs","text":"

Below is the submission script for running the Taylor-Green Vortex, one of the Nektar++ tutorials, see https://doc.nektar.info/tutorials/latest/incns/taylor-green-vortex/incns-taylor-green-vortex.html#incns-taylor-green-vortexch4.html .

You first need to download the archive linked on the tutorial page.

cd /path/to/work/dir\nwget https://doc.nektar.info/tutorials/latest/incns/taylor-green-vortex/incns-taylor-green-vortex.tar.gz\ntar -xvzf incns-taylor-green-vortex.tar.gz\n
#!/bin/bash\n#SBATCH --job-name=nektar\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=32\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load nektar\n\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nNEK_INPUT_PATH=/path/to/work/dir/incns-taylor-green-vortex/completed/solver64\n\nsrun --distribution=block:cyclic --hint=nomultithread \\\n    ${NEK_DIR}/bin/IncNavierStokesSolver \\\n        ${NEK_INPUT_PATH}/TGV64_mesh.xml \\\n        ${NEK_INPUT_PATH}/TGV64_conditions.xml\n
"},{"location":"research-software/nektarplusplus/#compiling-nektar","title":"Compiling Nektar++","text":"

Instructions for building Nektar++ on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/nektarplusplus/#more-information","title":"More information","text":"

The Nektar++ team have themselves also provided detailed instructions on the build process, updated following the mid-2023 system update, on the Nektar++ website:

This page also provides instructions on how to run jobs using your local installation.

"},{"location":"research-software/nemo/","title":"NEMO","text":"

NEMO (Nucleus for European Modelling of the Ocean) is a state-of-the-art framework for research activities and forecasting services in ocean and climate sciences, developed in a sustainable way by a European consortium.

"},{"location":"research-software/nemo/#useful-links","title":"Useful Links","text":"

NEMO is released under a CeCILL license and is freely available to all users on ARCHER2.

"},{"location":"research-software/nemo/#compiling-nemo","title":"Compiling NEMO","text":"

A central install of NEMO is not appropriate for most users of ARCHER2 since many configurations will want to add bespoke code changes.

The latest instructions for building NEMO on ARCHER2 are found in the Github repository of build instructions:

"},{"location":"research-software/nemo/#using-nemo-on-archer2","title":"Using NEMO on ARCHER2","text":"

Typical NEMO production runs perform significant I/O management to handle the very large volumes of data associated with ocean modelling. To address this, NEMO ocean clients are interfaced with XIOS I/O servers. XIOS is a library which manages NetCDF outputs for climate models. NEMO uses XIOS to simplify the I/O management and introduce dedicated processors to manage large volumes of data.

Users can choose to run NEMO in attached or detached mode: - In attached mode each processor acts as an ocean client and I/O-server process. - In detached mode ocean clients and external XIOS I/O-server processors are separately defined.

Running NEMO in attached mode can be done with a simple submission script specifying both the NEMO and XIOS executable to srun. However, typical production runs of NEMO will perform significant I/O management and will be unable to run in attached mode.

Detached mode introduces external XIOS I/O-servers to help manage the large volumes of data. This requires users to specify the placement of clients and servers on different cores throughout the node using the \u2013cpu-bind=map_cpu:<cpu map> srun option to define a CPU map or mask. It is tedious to construct these maps by hand. Instead, Andrew Coward provides a tool to aid users in the construction submission scripts:

/work/n01/shared/nemo/mkslurm_hetjob\n/work/n01/shared/nemo/mkslurm_hetjob_Gnu\n

Usage of the script:

usage: mkslurm_hetjob [-h] [-S S] [-s S] [-m M] [-C C] [-g G] [-N N] [-t T]\n                      [-a A] [-j J] [-v]\n\nPython version of mkslurm_alt by Andrew Coward using HetJob. Server placement\nand spacing remains as mkslurm but clients are always tightly packed with a\ngap left every \"NC_GAP\" cores where NC_GAP can be given by the -g argument.\nvalues of 4, 8 or 16 are recommended.\n\noptional arguments:\n  -h, --help  show this help message and exit\n  -S S        num_servers (default: 4)\n  -s S        server_spacing (default: 8)\n  -m M        max_servers_per_node (default: 2)\n  -C C        num_clients (default: 28)\n  -g G        client_gap_interval (default: 4)\n  -N N        ncores_per_node (default: 128)\n  -t T        time_limit (default: 00:10:00)\n  -a A        account (default: n01)\n  -j J        job_name (default: nemo_test)\n  -v          show human readable hetjobs (default: False)\n

Note

We recommend that you retain your own copy of this script as it is not directly provided by the ARCHER2 CSE team and subject to change. Once obtained, you can set your own defaults for options in the script.

For example, to run with 4 XIOS I/O-servers (a maximum of 2 per node), each with sole occupancy of a 16-core NUMA region and 96 ocean cores, spaced with a idle core in between each, use:

./mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 > myscript.slurm\n\nINFO:root:Running mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 -N 128 -t 00:10:00 -a n01 -j nemo_test -v False\nINFO:root:nodes needed= 2 (256)\nINFO:root:cores to be used= 100 (256)\n

This has reported that 2 nodes are needed with 100 active cores spread over 256 cores. This will also have produced a submission script \"myscript.slurm\":

#!/bin/bash\n#SBATCH --job-name=nemo_test\n#SBATCH --time=00:10:00\n#SBATCH --account=n01\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-core=1\n\n# Created by: mkslurm_hetjob -S 4 -s 16 -m 2 -C 96 -g 2 -N 128 -t 00:10:00 -a n01 -j nemo_test -v False\nmodule swap craype-network-ofi craype-network-ucx\nmodule swap cray-mpich cray-mpich-ucx\nmodule load cray-hdf5-parallel/1.12.0.7\nmodule load cray-netcdf-hdf5parallel/4.7.4.7\nexport OMP_NUM_THREADS=1\n\ncat > myscript_wrapper.sh << EOFB\n#!/bin/ksh\n#\nset -A map ./xios_server.exe ./nemo\nexec_map=( 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )\n#\nexec \\${map[\\${exec_map[\\$SLURM_PROCID]}]}\n##\nEOFB\nchmod u+x ./myscript_wrapper.sh\n\nsrun --mem-bind=local \\\n--ntasks=100 --ntasks-per-node=50 --cpu-bind=v,mask_cpu:0x1,0x10000,0x100000000,0x400000000,0x1000000000,0x4000000000,0x10000000000,0x40000000000,0x100000000000,0x400000000000,0x1000000000000,0x4000000000000,0x10000000000000,0x40000000000000,0x100000000000000,0x400000000000000,0x1000000000000000,0x4000000000000000,0x10000000000000000,0x40000000000000000,0x100000000000000000,0x400000000000000000,0x1000000000000000000,0x4000000000000000000,0x10000000000000000000,0x40000000000000000000,0x100000000000000000000,0x400000000000000000000,0x1000000000000000000000,0x4000000000000000000000,0x10000000000000000000000,0x40000000000000000000000,0x100000000000000000000000,0x400000000000000000000000,0x1000000000000000000000000,0x4000000000000000000000000,0x10000000000000000000000000,0x40000000000000000000000000,0x100000000000000000000000000,0x400000000000000000000000000,0x1000000000000000000000000000,0x4000000000000000000000000000,0x10000000000000000000000000000,0x40000000000000000000000000000,0x100000000000000000000000000000,0x400000000000000000000000000000,0x1000000000000000000000000000000,0x4000000000000000000000000000000,0x10000000000000000000000000000000,0x40000000000000000000000000000000 ./myscript_wrapper.sh\n

Submitting this script in a directory with the nemo and xios_server.exe executables will run the desired MPMD job. The exec_map array shows the position of each executable in the rank list (0 = xios_server.exe, 1 = nemo). For larger core counts the cpu_map can be limited to a single node map which will be cycled through as many times as necessary.

"},{"location":"research-software/nemo/#how-to-optimise-the-performance-of-nemo","title":"How to optimise the performance of NEMO","text":"

Note

Our optimisation advice is based on the ARCHER2 4-cabinet preview system with the same node architecture as the current ARCHER2 service but a total of 1,024 compute nodes. During these investigations we used NEMO-4.0.6 and XIOS-2.5.

Through testing with idealised test cases to optimise the computational performance (i.e. without the demanding I/O management that is typical of NEMO production runs), we have found that drastically under-populating the nodes does not affect the performance of the computation. This indicates that users can reserve large portions of the nodes without a performance detriment. Users can run larger simulations by reserving up to 75% of the node can be reserved for I/O management (i.e. XIOS I/O-servers).

XIOS I/O-servers can be more lightly packed than ocean clients and should be evenly distributed amongst the nodes i.e. not concentrated on a specific node. We found that placing 1 XIOS I/O-server per node with 4, 8, and 16 dedicated cores did not affect the performance. However, the performance was affected when allocating dedicated I/O-server cores outside of a 16-core NUMA region. Thus, users should confine XIOS I/O-servers to NUMA regions to improve performance and benefit from the memory hierarchy.

"},{"location":"research-software/nemo/#a-performance-investigation","title":"A performance investigation","text":"

Note

These results were collated during early user testing of the ARCHER2 service by Andrew Coward and is subject to change.

This table shows some preliminary results of a repeated 60 day simulation of the ORCA2_ICE_PISCES, SETTE configuration using various core counts and packing strategies:

Note

These results used the mkslurm script, now hosted in /work/n01/shared/nemo/old_scripts/mkslurm

It is clear from the previous results that fully populating an ARCHER2 node is unlikely to provide the optimal performance for any codes with moderate memory bandwidth requirements. The explored regular packing strategy does not allow experimentation with less wasteful packing strategies than half-population though.

There may be a case, for example, for just leaving every 1 in 4 cores idle, or every 1 in 8, or even fewer idle cores per node. The mkslurm_alt script (/work/n01/shared/nemo/old_scripts/mkslurm_alt) provided a method of generating cpu-bind maps for exploring these strategies. The script assumed no change in the packing strategy for the servers but the core spacing argument (-c) for the ocean cores is replaced by a -g option representing the frequency of a gap in the, otherwise tightly-packed, ocean cores.

Preliminary tests have been conducted with the ORCA2_ICE_PISCES SETTE test case. This is a relatively small test case that will fit onto a single node. It is also small enough to perform well in attached mode. First some baseline tests in attached mode.

Previous tests used 4 I/O servers each occupying a single NUMA. For this size model, 2 servers occupying half a NUMA each will suffice. That leaves 112 cores with which to try different packing strategies. Is it possible to match or better this elapsed time on a single node including external I/O servers? -Yes! -but not with an obvious gap frequency:

And activating land suppression can reduce times further:

The optimal two-node solution is also shown (this is quicker but the one node solution is cheaper).

This leads us to the current iteration of the mkslurm script - mkslurm_hetjob. Note a tightly-packed placement with no gaps amongst the ocean processes can be generated using a client gap interval greater than the number of clients. This script has been used to explore the different placement strategies with a larger configuration based on eORCA025. In all cases, 8 XIOS servers were used, each with sole occupancy of a 16-core NUMA and a maximum of 2 servers per node. The rest of the initial 4 nodes (and any subsequent ocean core-only nodes) were filled with ocean cores at various packing densities (from tightly packed to half-populated). A summary of the results are shown below.

The limit of scalability for this problem size lies around 1500 cores. One interesting aspect is that the cost, in terms of node hours, remains fairly flat up to a thousand processes and the choice of gap placement makes much less difference as the individual domains shrink. It looks as if, so long as you avoid inappropriately high numbers of processors, choosing the wrong placement won't waste your allocation but may waste your time.

"},{"location":"research-software/nwchem/","title":"NWChem","text":"

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters. The NWChem software can handle: biomolecules, nanostructures, and solid-state system; from quantum to classical, and all combinations; Gaussian basis functions or plane-waves; scaling from one to thousands of processors; properties and relativity.

"},{"location":"research-software/nwchem/#useful-links","title":"Useful Links","text":""},{"location":"research-software/nwchem/#using-nwchem-on-archer2","title":"Using NWChem on ARCHER2","text":"

NWChem is released under an Educational Community License (ECL 2.0) and is freely available to all users on ARCHER2.

"},{"location":"research-software/nwchem/#where-can-i-get-help","title":"Where can I get help?","text":"

If you have problems accessing or running NWChem on ARCHER2, please contact the Service Desk. General questions on the use of NWChem might also be directed to the [NWChem forum][1]. More experienced users with detailed technical issues on NWChem should consider submitting them to the NWChem GitHub issue tracker.

"},{"location":"research-software/nwchem/#running-nwchem-jobs","title":"Running NWChem jobs","text":"

The following script will run a NWChem job using 2 nodes (256 cores) in the standard partition. It assumes that the input file is called test_calc.nw.

#!/bin/bash\n\n# Request 2 nodes with 128 MPI tasks per node for 20 minutes\n\n#SBATCH --job-name=NWChem_test\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the NWChem module, avoid any unintentional OpenMP threading by\n# setting OMP_NUM_THREADS, and launch the code.\nmodule load nwchem\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --distribution=block:block --hint=nomultithread nwchem test_calc\n
"},{"location":"research-software/nwchem/#compiling-nwchem","title":"Compiling NWChem","text":"

The latest instructions for building NWChem on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/onetep/","title":"ONETEP","text":"

ONETEP (Order-N Electronic Total Energy Package) is a linear-scaling code for quantum-mechanical calculations based on density-functional theory.

"},{"location":"research-software/onetep/#useful-links","title":"Useful Links","text":""},{"location":"research-software/onetep/#using-onetep-on-archer2","title":"Using ONETEP on ARCHER2","text":"

ONETEP is only available to users who have a valid ONETEP licence.

If you have a ONETEP licence and wish to have access to ONETEP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

"},{"location":"research-software/onetep/#running-parallel-onetep-jobs","title":"Running parallel ONETEP jobs","text":"

The following script, supplied by the ONETEP developers, will run a ONETEP job using 2 nodes (256 cores) with 16 MPI processes per node and 8 OpenMP threads per MPI process. It assumes that there is a single calculation options file with the .dat extension in the working directory.

#!/bin/bash\n\n# --------------------------------------------------------------------------\n# A SLURM submission script for ONETEP on ARCHER2 (full 23-cabinet system).\n# Central install, Cray compiler version.\n# Supports hybrid (MPI/OMP) parallelism.\n#\n# 2022.06 Jacek Dziedzic, J.Dziedzic@soton.ac.uk\n#                         University of Southampton\n#         Lennart Gundelach, L.Gundelach@soton.ac.uk\n#                            University of Southampton\n#         Tom Demeyere, T.Demeyere@soton.ac.uk\n#                       University of Southampton\n# --------------------------------------------------------------------------\n\n# v1.00 (2022.06.04) jd: Adapted from the user-compiled Cray compiler version.\n\n# ==========================================================================================================\n# Edit the following lines to your liking.\n#\n#SBATCH --job-name=mine               # Name of the job.\n#SBATCH --nodes=2                     # Number of nodes in job.\n#SBATCH --ntasks-per-node=16          # Number of MPI processes per node.\n#SBATCH --cpus-per-task=8             # Number of OMP threads spawned from each MPI process.\n#SBATCH --time=5:00:00                # Max time for your job (hh:mm:ss).\n#SBATCH --partition=standard          # Partition: standard memory CPU nodes with AMD EPYC 7742 64-core processor\n#SBATCH --account=t01                 # Replace 't01' with your budget code.\n#SBATCH --qos=standard                # Requested Quality of Service (QoS), See ARCHER2 documentation\n\nexport OMP_NUM_THREADS=8              # Repeat the value from 'cpus-per-task' here.\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set up the job environment, loading the ONETEP module.\n# The module automatically sets OMP_PLACES, OMP_PROC_BIND and FI_MR_CACHE_MAX_COUNT.\n# To use a different binary, replace this line with either (drop the leading '#')\n# module load onetep/6.1.9.0-GCC-LibSci\n# to use the GCC-libsci binary, or with\n# module load onetep/6.1.9.0-GCC-MKL\n# to use the GCC-MKL binary.\n\nmodule load onetep/6.1.9.0-CCE-LibSci\n\n# ==========================================================================================================\n# !!! You should not need to modify anything below this line.\n# ==========================================================================================================\n\nworkdir=`pwd`\necho \"--- This is the submission script, the time is `date`.\"\n\n# Figure out ONETEP executable\nonetep_exe=`which onetep.archer2`\necho \"--- ONETEP executable is $onetep_exe.\"\n\nonetep_launcher=`echo $onetep_exe | sed -r \"s/onetep.archer2/onetep_launcher/\"`\n\necho \"--- workdir is '$workdir'.\"\necho \"--- onetep_launcher is '$onetep_launcher'.\"\n\n# Ensure exactly 1 .dat file in there.\nndats=`ls -l *dat | wc -l`\n\nif [ \"$ndats\" == \"0\" ]; then\n  echo \"!!! There is no .dat file in the current directory. Aborting.\" >&2\n  touch \"%NO_DAT_FILE\"\n  exit 2\nfi\n\nif [ \"$ndats\" == \"1\" ]; then\n  true\nelse\n  echo \"!!! More than one .dat file in the current directory, that's too many. Aborting.\" >&2\n  touch \"%MORE_THAN_ONE_DAT_FILE\"\n  exit 3\nfi\n\nrootname=`echo *.dat | sed -r \"s/\\.dat\\$//\"`\nrootname_dat=$rootname\".dat\"\nrootname_out=$rootname\".out\"\nrootname_err=$rootname\".err\"\n\necho \"--- The input file is $rootname_dat, the output goes to $rootname_out and errors go to $rootname_err.\"\n\n# Ensure ONETEP executable is there and is indeed executable.\nif [ ! -x \"$onetep_exe\" ]; then\n  echo \"!!! $onetep_exe does not exist or is not executable. Aborting!\" >&2\n  touch \"%ONETEP_EXE_MISSING\"\n  exit 4\nfi\n\n# Ensure onetep_launcher is there and is indeed executable.\nif [ ! -x \"$onetep_launcher\" ]; then\n  echo \"!!! $onetep_launcher does not exist or is not executable. Aborting!\" >&2\n  touch \"%ONETEP_LAUNCHER_MISSING\"\n  exit 5\nfi\n\n# Dump the module list to a file.\nmodule list >\\$modules_loaded 2>&1\n\nldd $onetep_exe >\\$ldd\n\n# Report details\necho \"--- Number of nodes as reported by SLURM: $SLURM_JOB_NUM_NODES.\"\necho \"--- Number of tasks as reported by SLURM: $SLURM_NTASKS.\"\necho \"--- Using this srun executable: \"`which srun`\necho \"--- Executing ONETEP via $onetep_launcher.\"\n\n\n# Actually run ONETEP\n# Additional srun options to pin one thread per physical core\n########################################################################################################################################################\nsrun --hint=nomultithread --distribution=block:block -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS $onetep_launcher -e $onetep_exe -t $OMP_NUM_THREADS $rootname_dat >$rootname_out 2>$rootname_err\n########################################################################################################################################################\n\necho \"--- srun finished at `date`.\"\n\n# Check for error conditions\nresult=$?\nif [ $result -ne 0 ]; then\n  echo \"!!! srun reported a non-zero exit code $result. Aborting!\" >&2\n  touch \"%SRUN_ERROR\"\n  exit 6\nfi\n\nif [ -r $rootname.error_message ]; then\n  echo \"!!! ONETEP left an error message file. Aborting!\" >&2\n  touch \"%ONETEP_ERROR_DETECTED\"\n  exit 7\nfi\n\ntail $rootname.out | grep completed >/dev/null 2>/dev/null\nresult=$?\nif [ $result -ne 0 ]; then\n  echo \"!!! ONETEP calculation likely did not complete. Aborting!\" >&2\n  touch \"%ONETEP_DID_NOT_COMPLETE\"\n  exit 8\nfi\n\necho \"--- Looks like everything went fine. Praise be.\"\ntouch \"%DONE\"\n\necho \"--- Finished successfully at `date`.\"\n
"},{"location":"research-software/onetep/#hints-and-tips","title":"Hints and Tips","text":"

See the information in the ONETEP documentation.

"},{"location":"research-software/onetep/#compiling-onetep","title":"Compiling ONETEP","text":"

The latest instructions for building ONETEP on ARCHER2 may be found in the GitHub repository of build instructions:

"},{"location":"research-software/openfoam/","title":"OpenFOAM","text":"

OpenFOAM is an open-source toolbox for computational fluid dynamics. OpenFOAM consists of generic tools to simulate complex physics for a variety of fields of interest, from fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetism and the pricing of financial options.

The core technology of OpenFOAM is a flexible set of modules written in C++. These are used to build solvers and utilities to perform pre-processing and post-processing tasks ranging from simple data manipulation to visualisation and mesh processing.

There are a number of different flavours of the OpenFOAM package with slightly different histories, and slightly different features. The two most common are distributed by openfoam.org and openfoam.com.

"},{"location":"research-software/openfoam/#useful-links","title":"Useful Links","text":""},{"location":"research-software/openfoam/#using-openfoam-on-archer2","title":"Using OpenFOAM on ARCHER2","text":"

OpenFOAM is released under a GPL v3 license and is freely available to all users on ARCHER2.

Upgrade 2023Full system
auser@ln01> module avail openfoam\n--------------- /work/y07/shared/archer2-lmod/apps/core -----------------\nopenfoam/com/v2106        openfoam/org/v9.20210903\nopenfoam/com/v2212 (D)    openfoam/org/v10.20230119 (D)\n

Note: the older versions were recompiled under PE22.12 in April 2023.

auser@ln01> module avail openfoam\n--------------- /work/y07/shared/archer2-lmod/apps/core -----------------\nopenfoam/com/v2106          openfoam/org/v9.20210903 (D)\nopenfoam/org/v8.20200901\n

Versions from openfoam.org are typically v8.0 etc and there is typically one release per year (in June; with a patch release in September). Versions from openfoam.com are e.g., v2106 (to be read as 2021 June) and there are typically two releases a year (one in June, and one in December).

To use OpenFOAM on ARCHER2 you should first load an OpenFOAM module, e.g.

user@ln01:> module load PrgEnv-gnu\nuser@ln01:> module load openfoam/com/v2106\n

(Note that the openfoam module will automatically load PrgEnv-gnu if it is not already active.) The module defines only the base installation directory via the environment variable FOAM_INSTALL_DIR. After loading the module you need to source the etc/bashrc file provided by OpenFOAM, e.g.

source ${FOAM_INSTALL_DIR}/etc/bashrc\n

You should then be able to use OpenFOAM. The above commands will also need to be added to any job/batch submission scripts you want to use to run OpenFOAM. Note that all the centrally installed versions of OpenFOAM are compiled under PrgEnv-gnu.

Note there are no default module versions specified. It is recommended to use a fully qualified module name (with the exact version, as in the example above).

"},{"location":"research-software/openfoam/#running-parallel-openfoam-jobs","title":"Running parallel OpenFOAM jobs","text":"

While it is possible to run limited OpenFOAM pre-processing and post-processing activities on the front end, we request all significant work is submitted to the queue system. Please remember that the front end is a shared resource.

A typical SLURM job submission script for OpenFOAM is given here. This would request 4 nodes to run with 128 MPI tasks per node (a total of 512 MPI tasks). Each MPI task is allocated one core (--cpus-per-task=1).

Upgrade 2023Full system
#!/bin/bash\n\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --distribution=block:block\n#SBATCH --hint=nomultithread\n#SBATCH --time=00:10:00\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Load the appropriate module and source the OpenFOAM bashrc file\n\nmodule load openfoam/org/v10.20230119\n\nsource ${FOAM_INSTALL_DIR}/etc/bashrc\n\n# Run OpenFOAM work, e.g.,\n\nsrun interFoam -parallel\n
#!/bin/bash\n\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --distribution=block:block\n#SBATCH --hint=nomultithread\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the appropriate module and source the OpenFOAM bashrc file\n\nmodule load openfoam/org/v8.20210901\n\nsource ${FOAM_INSTALL_DIR}/etc/bashrc\n\n# Run OpenFOAM work, e.g.,\n\nsrun interFoam -parallel\n
"},{"location":"research-software/openfoam/#compiling-openfoam","title":"Compiling OpenFOAM","text":"

If you want to compile your own version of OpenFOAM, instructions are available for ARCHER2 at:

"},{"location":"research-software/openfoam/#extensions-to-openfoam","title":"Extensions to OpenFOAM","text":"

Many packages extend the central OpenFOAM functionality in some way. However, there is no completely standardised way in which this works. Some packages assume they have write access to the main OpenFOAM installation. If this is the case, you must install your own version before continuing. This can be done on an individual basis, or a per-project basis using the project shared directories.

"},{"location":"research-software/openfoam/#module-version-history","title":"Module version history","text":"

The following centrally installed versions are available.

"},{"location":"research-software/openfoam/#upgrade-2023","title":"Upgrade 2023","text":""},{"location":"research-software/openfoam/#full-system","title":"Full system","text":""},{"location":"research-software/orca/","title":"ORCA","text":"

ORCA is an ab initio quantum chemistry program package that contains modern electronic structure methods including density functional theory, many-body perturbation, coupled cluster, multireference methods, and semi-empirical quantum chemistry methods. Its main field of application is larger molecules, transition metal complexes, and their spectroscopic properties. ORCA is developed in the research group of Frank Neese. The free version is available only for academic use at academic institutions.

Important

ORCA is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/orca/#useful-links","title":"Useful Links","text":""},{"location":"research-software/orca/#using-orca-on-archer2","title":"Using ORCA on ARCHER2","text":"

ORCA is available for academic use on ARCHER2 only. If you wish to use ORCA for commercial applications, you must contact the ORCA developers.

"},{"location":"research-software/orca/#running-parallel-orca-jobs","title":"Running parallel ORCA jobs","text":"

The following script will run an ORCA job on the ARCHER2 system using 256 MPI processes across 2 nodes, each MPI process will be placed on a separate physical core. It assumes that the input file is my_calc.inp

#!/bin/bash\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:20:00\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load orca\n\n# Launch the ORCA calculation\n#   * You must use \"$ORCADIR/orca\" so the application has the full executable path\n#   * Do not use \"srun\" to launch parallel ORCA jobs as they use OpenMPI rather than Cray MPICH\n#   * Remember to change the name of the input file to match your file name\n$ORCADIR/orca my_calc.inp\n
"},{"location":"research-software/qchem/","title":"QChem","text":"

QChem is an ab initio quantum chemistry software package for fast and accurate simulations of molecular systems, including electronic and molecular structure, reactivities, properties, and spectra.

Important

QChem is not part of the officially supported software on ARCHER2. While the ARCHER2 service desk is able to provide support for basic use of this software (e.g. access to software, writing job submission scripts) it does not generally provide detailed technical support for the software and you may be directed to seek support from other places if the service desk cannot answer the questions.

"},{"location":"research-software/qchem/#useful-links","title":"Useful Links","text":""},{"location":"research-software/qchem/#using-qchem-on-archer2","title":"Using QChem on ARCHER2","text":"

ARCHER2 has a site licence for QChem.

"},{"location":"research-software/qchem/#running-parallel-qchem-jobs","title":"Running parallel QChem jobs","text":"

Important

QChem parallelisation is only available on ARCHER2 by using multiple threads within a single compute node. Multi-process and multi-node parallelisation will not work on ARCHER2.

The following script will run QChem using 16 OpenMP threads using the input in hf3c.in.

#!/bin/bash\n#SBATCH --nodes=1\n#SBATCH --time=1:0:0\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=16\n\n# Replace [budget code] below with your project code (e.g. e05)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load other-software\nmodule load qchem\n\nexport OMP_PLACES=cores\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nexport SLURM_HINT=\"nomultithread\"\nexport SLURM_DISTRIBUTION=\"block:block\"\n\nqchem -slurm -nt $OMP_NUM_THREADS hf3c.in hf3c.out\n
"},{"location":"research-software/qe/","title":"Quantum Espresso","text":"

Quantum Espresso (QE) is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

"},{"location":"research-software/qe/#useful-links","title":"Useful Links","text":""},{"location":"research-software/qe/#using-qe-on-archer2","title":"Using QE on ARCHER2","text":"

QE is released under a GPL v2 license and is freely available to all ARCHER2 users.

"},{"location":"research-software/qe/#running-parallel-qe-jobs","title":"Running parallel QE jobs","text":"

For example, the following script will run a QE pw.x job using 4 nodes (128x4 cores).

#!/bin/bash\n\n# Request 4 nodes to run a 512 MPI task job with 128 MPI tasks per node.\n# The maximum walltime limit is set to be 20 minutes.\n\n#SBATCH --job-name=qe_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the relevant Quantum Espresso module\nmodule load quantum_espresso\n\n#\u00a0Set number of OpenMP threads to 1 to prevent multithreading by libraries\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\nsrun --hint=nomultithread --distribution=block:block pw.x < test_calc.in\n
"},{"location":"research-software/qe/#hints-and-tips","title":"Hints and tips","text":"

The QE module is set to load up the default QE-provided pseudo-potentials. If you wish to use non-default pseudo-potentials, you will need to change the ESPRESSO_PSEUDO variable to point to the directory you wish. This can be done by adding the following line after the module is loaded

export ESPRESSO_PSEUDO /path/to/pseudo_potentials\n
"},{"location":"research-software/qe/#compiling-qe","title":"Compiling QE","text":"

The latest instructions for building QE on ARCHER2 can be found in the GitHub repository of build instructions:

"},{"location":"research-software/vasp/","title":"VASP","text":"

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schr\u00f6dinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order M\u00f8ller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.

To determine the electronic ground state, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

"},{"location":"research-software/vasp/#useful-links","title":"Useful Links","text":""},{"location":"research-software/vasp/#using-vasp-on-archer2","title":"Using VASP on ARCHER2","text":"

VASP is only available to users who have a valid VASP licence.

If you have a VASP 5 or 6 licence and wish to have access to VASP on ARCHER2, please make a request via the SAFE, see:

Please have your license details to hand.

Note

Both VASP 5 and VASP 6 are available on ARCHER2. You generally need a different licence for each of these versions.

"},{"location":"research-software/vasp/#running-parallel-vasp-jobs","title":"Running parallel VASP jobs","text":"

To access VASP you should load the appropriate vasp module in your job submission scripts.

To load the default version of VASP, you would use:

module load vasp\n

Tip

VASP 6.4.3 and above have all been compiled to include Wannier90 functionality. Older versions of VASP on ARCHER2 do not include Wannier90.

Once loaded, the executables are called:

Once the module has been loaded, you can access the LDA and PBE pseudopotentials for VASP on ARCHER2 at:

$VASP_PSPOT_DIR\n

Tip

VASP 6 can make use of OpenMP threads in addition to running with pure MPI. We will add notes on performance and use of threading in VASP as information becomes available.

Example VASP submission script

#!/bin/bash\n\n# Request 16 nodes (2048 MPI tasks at 128 tasks per node) for 20 minutes.   \n\n#SBATCH --job-name=VASP_test\n#SBATCH --nodes=16\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code] \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the VASP module\nmodule load vasp/6\n\n# Avoid any unintentional OpenMP threading by setting OMP_NUM_THREADS\nexport OMP_NUM_THREADS=1\n\n# Ensure the cpus-per-task option is propagated to srun commands\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the code - the distribution and hint options are important for performance\nsrun --distribution=block:block --hint=nomultithread vasp_std\n
"},{"location":"research-software/vasp/#vasp-transition-state-tools-vtst","title":"VASP Transition State Tools (VTST)","text":"

As well as the standard VASP 5 modules, we provide versions of VASP 5 with the VASP Transition State Tools (VTST) from the University of Texas added. The VTST version adds various functionality to VASP and provides additional scripts to use with VASP. Additional functionality includes:

Full details of these methods and the provided scripts can be found on the VTST website.

On ARCHER2, the VTST version of VASP 5 can be accessed by loading the modules with VTST in the module name, for example:

module load vasp/6/6.4.1-vtst\n
"},{"location":"research-software/vasp/#compiling-vasp-on-archer2","title":"Compiling VASP on ARCHER2","text":"

If you wish to compile your own version of VASP on ARCHER2 (either VASP 5 or VASP 6) you can find information on how we compiled the central versions in the build instructions GitHub repository. See:

"},{"location":"research-software/vasp/#tips-for-using-vasp-on-archer2","title":"Tips for using VASP on ARCHER2","text":""},{"location":"research-software/vasp/#switching-mpi-transport-protocol-from-openfabrics-to-ucx","title":"Switching MPI transport protocol from OpenFabrics to UCX","text":"

The VASP modules are setup to use the OpenFabrics MPI transport protocol as testing has shown that this passes all the regression tests and gives the most reliable operation on ARCHER2. However, there may be cases where using UCX can give better performance than OpenFabrics.

If you want to try the UCX transport protocol then you can do this using by loading additional modules after you have loaded the VASP modules. For example, for VASP 6, you would use:

module load vasp/6\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\n
"},{"location":"research-software/vasp/#increasing-the-cpu-frequency-and-enabling-turbo-boost","title":"Increasing the CPU frequency and enabling turbo-boost","text":"

The default CPU frequency is currently set to 2 GHz on ARCHER2. While many VASP calculations are memory or MPI bound, some calculations can be CPU bound. For those cases, you may see a signiicant difference in performance by increasing the CPU frequency and enabling turbo-boost (though you will almost certainly also be less energy efficient).

You can do this by adding the line:

export SLURM_CPU_FREQ_REQ=2250000\n

in your job submission script before the srun command

"},{"location":"research-software/vasp/#performance-tips","title":"Performance tips","text":"

The performance of VASP depends on the version of VASP used, the performance of MPI collective operations, the choice of VASP parallelisation parameters (NCORE/NPAR and KPAR) and how many MPI processes per node are used.

KPAR: You should always use the maximum value of KPAR that is possible for your calculation within the memory limits of what is possible.

NCORE/NPAR: We have found that the optimal values of NCORE (and hence NPAR) depend on both the type of calculation you are performing (e.g. pure DFT, hybrid functional, \u0393-point, non-collinear) and the number of nodes/cores you are using for your calculation. In practice, this means that you should experiment with different values to find the best choice for your calculation. There is information below on the best choices for the benchmarks we have run on ARCHER2 that may serve as a useful starting point. The performance difference from choosing different values can vary by up to 100% so it is worth spending time investigating this.

MPI processes per node We found that it is sometimes beneficial to performance to use less MPI processes per node than the total number of cores per node in some cases for the benchmarks used.

OpenMP threads Using multiple OpenMP threads per MPI process can be beneficial to performance. 4 OpenMP threads per MPI process typically sees the best performance in the tests we have performed.

"},{"location":"research-software/vasp/#vasp-performance-data-on-archer2","title":"VASP performance data on ARCHER2","text":"

VASP performance data on ARCHER2 is currently available for two different benchmark systems:

"},{"location":"research-software/vasp/#cdte-supercell-hybrid-dft-functional-8-k-points-65-atoms","title":"CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms","text":"

Basic information:

Performance summary:

Setup details: - vasp/6/6.4.2-mkl19 module - GCC 11.2.0 - MKL 19.5 for BLAS/LAPACK/ScaLAPACK and FFTW - OFI for MPI transport layer

Nodes MPI processes per node OpenMP thread per MPI process Total cores NCORE KPAR LOOP+ Time 1 32 4 128 1 2 5838 2 32 4 256 1 2 3115 4 32 4 512 1 2 1682 8 32 4 1024 1 2 928 16 128 1 2048 16 2 612 32 128 1 4096 16 2 459 64 128 1 8192 16 2 629"},{"location":"research-software/castep/castep/","title":"Castep","text":"

This page has moved

"},{"location":"research-software/chemshell/chemshell/","title":"Chemshell","text":"

This page has moved

"},{"location":"research-software/code-saturne/code-saturne/","title":"Code saturne","text":"

This page has moved

"},{"location":"research-software/cp2k/cp2k/","title":"Cp2k","text":"

This page has moved

"},{"location":"research-software/fhi-aims/fhi-aims/","title":"Fhi aims","text":"

This page has moved

"},{"location":"research-software/gromacs/gromacs/","title":"Gromacs","text":"

This page has moved

"},{"location":"research-software/lammps/lammps/","title":"Lammps","text":"

This page has moved

"},{"location":"research-software/mitgcm/mitgcm/","title":"Mitgcm","text":"

This page has moved

"},{"location":"research-software/mo-unified-model/mo-unified-model/","title":"Mo unified model","text":"

This page has moved

"},{"location":"research-software/namd/namd/","title":"Namd","text":"

This page has moved

"},{"location":"research-software/nektarplusplus/nektarplusplus/","title":"Nektarplusplus","text":"

This page has moved

"},{"location":"research-software/nemo/nemo/","title":"Nemo","text":"

This page has moved

"},{"location":"research-software/nwchem/nwchem/","title":"Nwchem","text":"

This page has moved

"},{"location":"research-software/onetep/onetep/","title":"Onetep","text":"

This page has moved

"},{"location":"research-software/openfoam/openfoam/","title":"Openfoam","text":"

This page has moved

"},{"location":"research-software/qe/qe/","title":"Qe","text":"

This page has moved

"},{"location":"research-software/vasp/vasp/","title":"Vasp","text":"

This page has moved

"},{"location":"software-libraries/","title":"Software Libraries","text":"

This section provides information on centrally-installed software libraries and library-based packages. These provide significant functionality that is of interest to both users and developers of applications.

Libraries are made available via the module system, and fall into a number of distinct groups.

"},{"location":"software-libraries/#libraries-via-modules-cray-","title":"Libraries via modules cray-*","text":"

The following libraries are available as modules prefixed by cray- and may be of direct interest to developers and users. The modules are provided by HPE Cray to be optimised for performance on the ARCHER2 hardware, and should be used where possible. The relevant modules are:

"},{"location":"software-libraries/#integration-with-compiler-environment","title":"Integration with compiler environment","text":"

All libraries provided by modules prefixed cray- integrate with the compiler environment, and so appropriate compiler and link stage options are injected when using the standard compiler wrappers cc, CC and ftn.

"},{"location":"software-libraries/#libraries-supported-by-archer2-cse-team","title":"Libraries supported by ARCHER2 CSE team","text":"

The following libraries will also made available by the ARCHER2 CSE team:

"},{"location":"software-libraries/#integration-with-compiler-environment_1","title":"Integration with compiler environment","text":"

Again, all the libraries listed above are supported by all programming environments via the module system. Additional compile and link time flags should not be required.

"},{"location":"software-libraries/#building-your-own-library-versions","title":"Building your own library versions","text":"

For the libraries listed in this section, a set of build and installation scripts are available at the ARCHER2 Github repository.

Follow the instructions to build the relevant package (note this is the cse-develop branch of the repository). See also individual libraries pages in the list above for further details.

The scripts available from this repository should work in all three programming environments.

"},{"location":"software-libraries/adios/","title":"ADIOS","text":"

The Adaptable I/O System (ADIOS) is developed at Oak Ridge National Laboratory and is freely available under a BSD license.

"},{"location":"software-libraries/adios/#version-history","title":"Version history","text":"Upgrade 2023

The central installation of ADIOS (version 1) has been removed as it is no longer actively developed. A central installation of ADIOS (version 2) will be considered as a replacement.

Full system4-cabinet system "},{"location":"software-libraries/adios/#compile-your-own-version","title":"Compile your own version","text":"

The Archer2 github repository provides a script which can be used to build ADIOS (version 1), e.g.,:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ module load cray-hdf5-parallel\n$ ./sh/adios.sh --prefix=/path/to/install/location\n
where the --prefix option determines the install location. See the Archer2 github repository for further details and options.

"},{"location":"software-libraries/adios/#using-adios","title":"Using ADIOS","text":"

Configuration details for ADIOS are obtained via the utility adios_config which should be available in the PATH once ADIOS is installed. For example, to recover the compiler options required to provide serial C include files, issue:

$ adios_config -s -c\n
Use adios_config --help for a summary of options.

To compile and link application, such statements can be embedded in a Makefile via, e.g.,

ADIOS_INC := $(shell adios_config -s -c)\nADIOS_CLIB := $(shell adios_config -s -l)\n
See the ADIOS user manual for further details and examples.

"},{"location":"software-libraries/adios/#resources","title":"Resources","text":"

The ADIOS home page

ADIOS user manual (v1.10 pdf version)

ADIOS 1.x github repository

"},{"location":"software-libraries/aocl/","title":"AMD Optimizing CPU Libraries (AOCL)","text":"

AMD Optimizing CPU Libraries (AOCL) are a set of numerical libraries optimized for AMD \u201cZen\u201d-based processors, including EPYC, Ryzen Threadripper PRO, and Ryzen.

AOCL is comprised of eight libraries: - BLIS (BLAS Library) - libFLAME (LAPACK) - AMD-FFTW - LibM (AMD Core Math Library) - ScaLAPACK - AMD Random Number Generator (RNG) - AMD Secure RNG - AOCL-Sparse

Tip

AOCL 3.1 and 4.0 are available. 3.1 is default.

"},{"location":"software-libraries/aocl/#compiling-with-aocl","title":"Compiling with AOCL","text":"

Important

AOCL does not currently support the Cray programming environment and is currently unavailable with PrgEnv-cray loaded.

Important

The cray-libsci module is loaded by default for all users and this module also contains definitions of BLAS, LAPACK and ScaLAPACK routines that conflict with those in AOCL. The aocl module automatically unloads cray-libsci.

"},{"location":"software-libraries/aocl/#gnu-programming-environment","title":"GNU Programming Environment","text":"

AOCL 3.1 and 4.0 is available for all versions of the GCC compilers: gcc/11.2.0 and gcc/10.3.0

module load PrgEnv-gnu\nmodule load aocl\n
"},{"location":"software-libraries/aocl/#aocc-programming-environment","title":"AOCC Programming Environment","text":"

AOCL 3.1 and 4.0 is available for all versions of the AOCC compilers: aocc/3.2.0.

module load PrgEnv-aocc\nmodule load aocl\n
"},{"location":"software-libraries/aocl/#resources","title":"Resources","text":"

For more information on AOCL, please see: https://developer.amd.com/amd-aocl/#documentation

"},{"location":"software-libraries/aocl/#version-history","title":"Version history","text":"

Current modules:

"},{"location":"software-libraries/arpack/","title":"ARPACK-NG","text":"

The Arnoldi Package (ARPACK) was designed to compute eigenvalues and eigenvectors of large sparse matrices. Originally from Rice University, an open source version (ARPACK-NG) is available under a BSD license and is made available here.

"},{"location":"software-libraries/arpack/#compiling-and-linking-with-arpack","title":"Compiling and linking with ARPACK","text":"

To compile an application against the ARPACK-NG libraries, load the arpack-ng module and use the compiler wrappers cc, CC, and ftn in the usual way.

The arpack-ng module defines ARPACK_NG_DIR which locates the root of the installation for the current programming environment.

"},{"location":"software-libraries/arpack/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/arpack/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version of MUMPS on Archer2 can be compiled using a script available from the Archer githug repository.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/arpack-ng.sh --prefix=/path/to/install/location\n
where the --prefix specifies a suitable location. See the Archer2 github repository for further options and details. Note that the build process runs the tests, for which an salloc allocation is required to allow the parallel tests to run correctly.

"},{"location":"software-libraries/arpack/#resources","title":"Resources","text":"

ARPACK-NG github site

"},{"location":"software-libraries/boost/","title":"Boost","text":"

Boost provide portable C++ libraries useful in a broad range of contexts. The libraries are freely available under the terms of the Boost Software license.

"},{"location":"software-libraries/boost/#compiling-and-linking","title":"Compiling and linking","text":"

The C++ compiler wrapper CC will introduce the appropriate options to compile an application against the Boost libraries. The other compiler wrappers (cc and ftn) do not introduce these options.

To check exactly what options are introduced type, e.g.,

$ CC --cray-print-opts\n

The boost module also defines the environment variable BOOST_DIR as the root of the installation for the current programming environment if this information is needed.

"},{"location":"software-libraries/boost/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

The following libraries are installed: atomic chrono container context contract coroutine date_time exception fiber filesystem graph_parallel graph iostreams locale log math mpi program_options random regex serialization stacktrace system test thread timer type_erasure wave

"},{"location":"software-libraries/boost/#compiling-boost","title":"Compiling Boost","text":"

The ARCHER2 Github repository contains a recipe for compiling Boost for the different programming environments.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout cse-develop\n$ ./sh/boost.sh --prefix=/path/to/install/location\n
where the --prefix determines the install location. The list of libraries compiled is specified in the boost.sh script. See the ARCHER2 Github repository for further information.

"},{"location":"software-libraries/boost/#resources","title":"Resources","text":""},{"location":"software-libraries/eigen/","title":"Eigen","text":"

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

"},{"location":"software-libraries/eigen/#compiling-with-eigen","title":"Compiling with Eigen","text":"

To compile an application with the Eigen header files, load the eigen module and use the compiler wrappers cc, CC, or ftn in the usual way. The relevant header files will be introduced automatically.

The header files are located in /work/y07/shared/libs/core/eigen/3.4.0/, and can be included manually at compilation without loading the module if required.

"},{"location":"software-libraries/eigen/#version-history","title":"Version history","text":""},{"location":"software-libraries/eigen/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version on Archer2 can be built using the following script

$ wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz\n$ tar xvf eigen-3.4.0.tar.gz\n$ cmake eigen-3.4.0/ -DCMAKE_INSTALL_PREFIX=/path/to/install/location\n$ make install\n
where the -DCMAKE_INSTALL_PREFIX option determines the install directory. Installing in this way will also build the Eigen documentation and unit-tests.

"},{"location":"software-libraries/eigen/#resources","title":"Resources","text":"

Eigen home page

Getting Started guide

"},{"location":"software-libraries/fftw/","title":"FFTW","text":"

FFTW is a C subroutine library (which includes a Fortran interface) for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

Only the version 3 interface is available on ARCHER2.

"},{"location":"software-libraries/glm/","title":"GLM","text":"

OpenGL Mathemetics (GLM) is a header-only C++ library which performs operations typically encountered in graphics applications, but can also be relevant to scientific applications. GLM is freely available under an MIT license.

"},{"location":"software-libraries/glm/#compiling-with-glm","title":"Compiling with GLM","text":"

The compiler wrapper CC will automatically location the required include directory when the module is loaded.

The glm module also defines the environment variable GLM_DIR which carries the root of the installation, if needed.

"},{"location":"software-libraries/glm/#version-history","title":"Version history","text":"Full system4-cabinet system "},{"location":"software-libraries/glm/#install-your-own-version","title":"Install your own version","text":"

One can follow the instructions used to install the current version on ARCHER2 via the ARCHER2 Github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2021-10\n$ ./sh/glm.sh --prefix=/path/to/install/location\n
where the --prefix option sets the install location. See the ARCHER2 Github repository for further details.

"},{"location":"software-libraries/glm/#resources","title":"Resources","text":"

The GLM Github repository.

"},{"location":"software-libraries/hdf5/","title":"HDF5","text":"

The Hierarchical Data Format HDF5 (and its parallel manifestation HDF5 parallel) is a standard library and data format developed and supported by The HDF Group, and is released under a BSD-like license.

Both serial and parallel versions are available on ARCHER2 as standard modules:

Use module help to locate cray-specific release notes on a particular version.

Known issues:

Upgrade 2023Full system4-cabinet system

Some general comments and information on serial and parallel I/O to ARCHER2 are given in the section on I/O and file systems.

"},{"location":"software-libraries/hdf5/#resources","title":"Resources","text":"

Tutorials and introduction to HDF5 at the HDF5 Group pages.

General information for developers of HDF5.

"},{"location":"software-libraries/hypre/","title":"HYPRE","text":"

HYPRE is a library of linear solvers for structured and unstructured problems with a particular emphasis on multigrid. It is a product of the Lawrence Livermore National Laboratory and is distributed under either the MIT license or the Apache license.

"},{"location":"software-libraries/hypre/#compiling-and-linking-with-hypre","title":"Compiling and linking with HYPRE","text":"

To compile and link an application with the HYPRE libraries, load the hypre module and use the compiler wrappers cc, CC, or ftn in the usual way. The relevant include files and libraries will be introduced automatically.

Two versions of HYPRE are included: one with, and one without, OpenMP. The relevant version will be selected if e.g., -fopenmp is included in the compile or link stage.

The hypre module defines the environment variable HYPRE_DIR which will show the root of the installation for the current programming environment if required.

"},{"location":"software-libraries/hypre/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/hypre/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version on Archer2 can be built using the script from the Archer2 repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/hypre.sh --prefix=/path/to/install/location\n
where the --prefix option determines the install directory. See the Archer2 github repository for more information.

"},{"location":"software-libraries/hypre/#resources","title":"Resources","text":"

HYPRE home page

The latest HYPRE user manual (HTML)

An older pdf version

HYPRE github repository

"},{"location":"software-libraries/libsci/","title":"HPE Cray LibSci","text":"

Cray scientific libraries, available for all compiler choices provides access to the Fortran BLAS and LAPACK interface for basic linear algebra, the corresponding C interfaces CBLAS and LAPACKE, and BLACS and ScaLAPACK for parallel linear algebra. Type man intro_libsci for further details.

Additionally there is GPU support available via the cray-libsci_acc module. More information can be found here.

"},{"location":"software-libraries/matio/","title":"Matio","text":"

Matio is a library which allows reading and writing matrices in MATLAB MAT format. It is an open source development released under a BSD license.

"},{"location":"software-libraries/matio/#compiling-and-linking-against-matio","title":"Compiling and linking against Matio","text":"

Load the matio module and use the standard compiler wrappers cc, CC, or ftn in the usual way. The appropriate header files and libraries will be included automatically via the compiler wrappers.

The matio module set the PATH variable so that the stand-alone utility matdump can be used. The module also defines MATIO_PATH which gives the root of the installation if this is needed.

"},{"location":"software-libraries/matio/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/matio/#compiling-your-own-version","title":"Compiling your own version","text":"

A version of Matio as currently installed on Archer2 can be compiled using the script avaailable from the Archer2 github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/matio.sh --prefix=/path/to/install/location\n
where --prefix defines the location of the installation.

"},{"location":"software-libraries/matio/#resources","title":"Resources","text":"

Matio github repository

"},{"location":"software-libraries/mesa/","title":"Mesa","text":"

Mesa is an open-source implementation of OpenGL, Vulkan, and other graphics API to vendor-specific hardware drivers.

"},{"location":"software-libraries/mesa/#compiling-with-mesa","title":"Compiling with Mesa","text":"

To compile an application with the mesa header files, load the mesa module and use the compiler wrappers in the usual way. The relevant header files will be introduced automatically.

The header files are located in /work/y07/shared/libs/core/mesa/21.0.1/, and can be included manually at compilation without loading the module if required.

"},{"location":"software-libraries/mesa/#version-history","title":"Version history","text":""},{"location":"software-libraries/mesa/#compiling-your-own-version","title":"Compiling your own version","text":"

Build recipe for this module can be found at the HPC-UK github repo

"},{"location":"software-libraries/mesa/#resources","title":"Resources","text":"

Mesa home page

"},{"location":"software-libraries/metis/","title":"Metis and Parmetis","text":"

The University of Minnesota provide a family of libraries for partitioning graphs and meshes, and computing fill-reducing ordering of sparse matrices. These libraries coming broadly under the label of \"Metis\". They are free to use for educational and research purposes.

"},{"location":"software-libraries/metis/#metis","title":"Metis","text":"

Metis is the sequential library for partitioning problems; it also supplies a number of simple stand-alone utility programs to access the Metis API for graph and mesh partitioning, and graph and mesh manipulation. The stand alone programs typically read a graph or mesh from file which must be in \"metis\" format.

"},{"location":"software-libraries/metis/#compiling-and-linking-with-metis","title":"Compiling and linking with Metis","text":"

The Metis library available via module load metis comes both with and without support for OpenMP. When using the compiler wrappers cc, CC, and ftn, the appropriate version will be selected based on the presence or absence of, e.g., -fopenmp in the compile or link invocation.

Use, e.g.,

$ cc --cray-print-opts\n
or
$ cc -fopenmp --cray-print-opts\n
to see exactly what options are being issued by the compiler wrapper when the metis module is loaded.

Metis is currently provided as static libraries, so it should not be necessary to re-load the metis module at run time.

The serial utilities (e.g. gpmetis for graph partitioning) are supplied without OpenMP. These may then be run on the front end for small problems if the metis module is loaded.

The metis module defines the environment variable METIS_DIR which indicates the current location of the Metis installation.

Note the metis and parmetis libraries (and dependent modules) have been compiled with the default 32-bit integer indexing, and 4-byte floating point options.

"},{"location":"software-libraries/metis/#parmetis","title":"Parmetis","text":"

Parmetis is the distributed memory incarnation of the Metis functionality. As for the metis module, Parmetis is integrated with use of the compiler wrappers cc, CC, and ftn.

Parmetis depends on the metis module, which is loaded automatically by the parmetis module.

The parmetis module defines the environment variable PARMETIS_DIR which holds the current location of the Parmetis installation. This variable may not respond to a change of compiler version within a given programming environment. If you wish to use PARMETIS_DIR in such a context, you may need to (re-)load the parmetis module after the change of compiler version.

"},{"location":"software-libraries/metis/#module-version-history","title":"Module version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/metis/#compile-your-own-version","title":"Compile your own version","text":"

The build procedure used for the Metis and Parmetis libraries on Archer2 is available via github.

"},{"location":"software-libraries/metis/#metis_1","title":"Metis","text":"

The latest Archer2 version of Metis can be installed

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n

where --prefix determines the install location. This will download and install the default version for the current programming environment.

"},{"location":"software-libraries/metis/#parmetis_1","title":"Parmetis","text":"

Parmetis can be installed in via the same mechanism as Metis:

$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n
The Metis package should be installed first (as above) using the same location. See the Archer2 repository for further details and options.

"},{"location":"software-libraries/metis/#resources","title":"Resources","text":"

-- Metis and Parmetis at github

"},{"location":"software-libraries/mkl/","title":"Intel Math Kernel Library (MKL)","text":"

The Intel Maths Kernel Libraries (MKL) contain a variety of optimised numerical libraries including BLAS, LAPACK, ScaLAPACK and FFTW. In general, the exact commands required to build against MKL depend on the details of compiler, environment, requirements for parallelism, and so on. The Intel MKL link line advisor should be consulted.

Some examples are given below. Note that loading the mkl module will provide the environment variable MKLROOT which holds the location of the various MKL components.

Warning

The ARCHER2 CSE team have seen that using MKL on ARCHER2 for some software leads to failed regression tests due to numerical differences between refernece results and those produced with software using MKL.

We strongly recommend that you use the HPE Cray LibSci and HPE Cray FFTW libraries for software if at all possible rather than MKL. If you do decide to use MKL on ARCHER2, then you should carefully validate results from your software to ensure that it is giving the expected results.

Important

The cray-libsci module is loaded by default for all users and this module also contains definitions of BLAS, LAPACK and ScaLAPACK routines that conflict with those in MKL. The mkl module automatically unloads cray-libsci.

Important

The mkl module needs to be loaded both at compile time and at runtime (usually in your job submission script).

Tip

MKL only supports the GCC programming environment (PrgEnv-gnu). Other programming environments may work but this is untested and unsupported on ARCHER2.

"},{"location":"software-libraries/mkl/#serial-mkl-with-gcc","title":"Serial MKL with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl"},{"location":"software-libraries/mkl/#threaded-mkl-with-gcc","title":"Threaded MKL with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"},{"location":"software-libraries/mkl/#mkl-parallel-scalapack-with-gcc","title":"MKL parallel ScaLAPACK with GCC","text":"

Swap modules:

module load PrgEnv-gnu\nmodule load mkl\n
Language Compile options Link options Fortran -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lgomp -lpthread -lm -ldl C/C++ -m64 -I\"${MKLROOT}/include\" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lgomp -lpthread -lm -ldl"},{"location":"software-libraries/mumps/","title":"MUMPS","text":"

MUMPS is a parallel solver for large sparse systems and features a 'multifrontal' method and is developed largely at CERFCAS, ENS Lyon, IRIT Toulouse, INRIA, and the University of Bordeaux. It is provided free of charge and is largely under a CeCILL-C license.

"},{"location":"software-libraries/mumps/#compiling-and-linking-with-mumps","title":"Compiling and linking with MUMPS","text":"

To compile an application against the MUMPS libraries, load the mumps module and use the compiler wrappers cc, CC, and ftn in the usual way.

MUMPS is configured to allow Pord, Metis, Parmetis, and Scotch orderings.

Two versions of MUMPS are provided: one with, and one without, OpenMP. The relevant version will be selected if the relevant option is included at the compile stage.

The mumps module defines MUMPS_DIR which locates the root of the installation for the current programming environment.

"},{"location":"software-libraries/mumps/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: mumps/5.5.1 uses scotch/7.0.3 while mumps/5.3.5 uses scotch/6.1.0.

Known issues: The OpenMP version in PrgEnv-aocc is not available at the moment.

"},{"location":"software-libraries/mumps/#compiling-your-own-version","title":"Compiling your own version","text":"

The current supported version of MUMPS on Archer2 can be compiled using a script available from the Archer githug repository.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/mumps.sh --prefix=/path/to/install/location\n
where the --prefix option should be the same for MUMPS at the three dependencies (Metis, Parmetis, and Scotch Version 7). See the Archer2 github repository for further options and details.

"},{"location":"software-libraries/mumps/#resources","title":"Resources","text":"

The MUMPS home page

MUMPS user manual (Version 5.6, pdf)

"},{"location":"software-libraries/netcdf/","title":"NetCDF","text":"

The Network Common Data Form NetCDF (and its parallel manifestation NetCDF parallel) is a standard library and data format developed and supported by UCAR is released under a BSD-like license.

Both serial and parallel versions are available on ARCHER2 as standard modules:

Note that one should first load the relevant HDF module file, e.g.,

$ module load cray-hdf5\n$ module load cray-netcdf\n
for the serial version.

Use module spider to locate available versions, and use module help to locate cray-specific release notes on a particular version.

Known issues:

Upgrade 2023Full system4-cabinet system

Some general comments and information on serial and parallel I/O to ARCHER2 are given in the section on I/O and file systems.

"},{"location":"software-libraries/netcdf/#resources","title":"Resources","text":"

The NetCDF home page.

"},{"location":"software-libraries/petsc/","title":"PETSc","text":"

PETSc is a suite of parallel tools for solution of partial differential equations. PETSc is developed at Argonne National Laboratory and is freely available under a BSD 2-clause license.

"},{"location":"software-libraries/petsc/#build","title":"Build","text":"

Applications may be linked against PETSc by loading the petsc module and using the compiler wrappers cc, CC, and ftn in the usual way. Details of options introduced by the compiler wrappers can be examined via, e.g.,

$ cc --cray-print-opts\n

PETSC is configured with Metis, Parmetis, and Scotch orderings, and to support HYPRE, MUMPS, SuperLU, and SuperLU-DIST. PETSc is compiled without OpenMP.

The petsc module defines the environment variable PETSC_DIR as the root of the installation if this is required.

"},{"location":"software-libraries/petsc/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: PETSc has a number of dependencies; where applicable, the newer version of PETSc depends on the newer module version of each relevant dependency. Check module list to be sure.

Known issues: PETSc is not currently available for PrgEnv-aocc. There is no HYPRE support in this version.

"},{"location":"software-libraries/petsc/#compile-your-own-version","title":"Compile your own version","text":"

It is possible to follow the steps used to build the current version on Archer2. These steps are codified at the Archer2 github repository and include a number of dependencies to be built in the correct order:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2012-12\n$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/hypre.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/mumps.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu-dist.sh --prefix=/path/to/install/location\n\n$ module load cray-hdf5\n$ ./sh/petsc.sh --prefix=/path/to/install/location\n
The --prefix option indicating the install directory should be the same in all cases. See the Archer2 github repository for further details (and options). This will compile version 3.18.5 against the latest module versions of each dependency.

"},{"location":"software-libraries/petsc/#resources","title":"Resources","text":"

PETSc home page

Current PETSc documentation (HTML)

"},{"location":"software-libraries/scotch/","title":"Scotch and PT-Scotch","text":"

Scotch and its parallel version PT-Scotch are provided by Labri at the University of Bordeaux and INRIA Bordeaux South-West. They are used for graph partitioning and ordering problems. The libraries are freely available for scientific use under a license similar to the LGPL license.

"},{"location":"software-libraries/scotch/#scotch-and-pt-scotch_1","title":"Scotch and PT-Scotch","text":"

The scotch module provides access to both the Scotch and PT-Scotch libraries via the compiler system. A number of stand-alone utilities are also provided as part of the package.

"},{"location":"software-libraries/scotch/#compiling-and-linking","title":"Compiling and linking","text":"

If the scotch module is loaded, then applications may be automatically compiled and linked against the libraries for the current programming environment. Check, e.g.,

$ cc --cray-print-opts\n
if you wish to see exactly what options are generated by the compiler wrappers.

Scotch and PT-Scotch libraries are provides as static archives only. The compiler wrappers do not give access to the libraries libscotcherrexit.a or libptscotcherrexit.a. If you wish to perform your own error handling these libraries must be linked manually.

The scotch module defines the environment SCOTCH_DIR which holds the root of the installation for a given programming environment. Libraries are present in ${SCOTCH_DIR}/lib.

Stand-alone applications are also available. See the Scotch and PT-Scotch user manuals for further details.

"},{"location":"software-libraries/scotch/#module-version-history","title":"Module version history","text":"Upgrade 2023Full system4-cabinet system

Note: scotch/7.0.3 has disabled a number of features including the Metis compatibility layer, and threads, to allow all tests to pass.

"},{"location":"software-libraries/scotch/#compiling-your-own-version","title":"Compiling your own version","text":"

The build procedure for the Scotch package on Archer2 is available via github.

"},{"location":"software-libraries/scotch/#scotch-and-pt-scotch_2","title":"Scotch and PT-Scotch","text":"

The latest Scotch and PT-Scotch libraries are installed on Archer using the following mechanism:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/scotchv7.sh --prefix=/path/to/install/location\n
where the --prefix option defines the destination for the install. This script will download, compile and install version 7.0.3. A separate script (scotch.sh) in the same location is used for version 6.

"},{"location":"software-libraries/scotch/#resources","title":"Resources","text":"

The Scotch home page

Scotch user manual (pdf)

PT-Scotch user manual (pdf)

"},{"location":"software-libraries/slepc/","title":"SLEPC","text":"

The Scalable Library for Eigenvalue Problem computations is an extension of PETSc developed at the Universitat Politecnica de Valencia. SLEPc is freely available under a 2-clause BSD license.

"},{"location":"software-libraries/slepc/#compiling-and-linking-with-slepc","title":"Compiling and linking with SLEPc","text":"

To compile an application against the SLEPc libraries, load the slepc module and use the compiler wrappers cc, CC, and ftn in the usual way. Static libraries are available so no module is required at run time.

The SLEPc module defines SLEPC_DIR which locates the root of the installation.

"},{"location":"software-libraries/slepc/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note: each SLEPc module depends on a PETSc module with the same minor version number.

"},{"location":"software-libraries/slepc/#compiling-your-own-version","title":"Compiling your own version","text":"

The version of SLEPc currently available on ARCHER2 can be compiled using a script available from the ARCHER2 github repository:

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/slepc.sh --prefix=/path/to/install/location\n
The dependencies (including PETSc) can be built in the same way, or taken from the existing modules. See the ARCHER2 github repository for further information.

"},{"location":"software-libraries/slepc/#resources","title":"Resources","text":"

SLEPc home page

Latest release version of SLEPc user manual (PDF)

SLEPc Gitlab repository

"},{"location":"software-libraries/superlu/","title":"SuperLU and SuperLU_DIST","text":"

SuperLU and SuperLU_DIST are libraries for the direct solution of large sparse non-symmetric systems of linear equations, typically by factorisation and back-substitution. The libraries are provided by Lawrence Berkeley National Laboratory and are freely available under a slightly modified BSD-style license.

Two separate modules are provided for SuperLU and SuperLU_DIST.

"},{"location":"software-libraries/superlu/#superlu","title":"SuperLU","text":"

This module provides the serial library SuperLU.

"},{"location":"software-libraries/superlu/#compiling-and-linking-with-superlu","title":"Compiling and linking with SuperLU","text":"

Compiling and linking SuperLU applications requires no special action beyond module load superlu and using the standard compiler wrappers cc, CC, or ftn. The exact options issued by the compiler wrapper can be examined via, e.g.,

$ cc --cray-print-opts\n
while the module is loaded.

The module defines the environment variable SUPERLU_DIR as the root location of the installation for a given programming environment.

"},{"location":"software-libraries/superlu/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/superlu/#superlu_dist","title":"SuperLU_DIST","text":"

This modules provides the distributed memory parallel library SuperLU_DIST both with and without OpenMP.

"},{"location":"software-libraries/superlu/#compiling-and-linking-superlu_dist","title":"Compiling and linking SuperLU_DIST","text":"

Use the standard compiler wrappers:

$ cc my_superlu_dist_application.c\n
or
$ cc -fopenmp my_superlu_dist_application.c\n
to compile the and link against the appropriate libraries.

The superlu-dist module defines the environment variable SUPERLU_DIST_DIR as the root of the installation for the current programming environment.

"},{"location":"software-libraries/superlu/#version-history_1","title":"Version history","text":"Upgrade 2023Full system4-cabinet system "},{"location":"software-libraries/superlu/#compiling-your-own-version","title":"Compiling your own version","text":"

The build used for Archer2 can be replicated by using the scripts provided at the Archer2 repository.

"},{"location":"software-libraries/superlu/#superlu_1","title":"SuperLU","text":"

The current Archer2 supported version may be built via

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ./sh/tpsl/superlu.sh --prefix=/path/to/install/location\n
where the --prefix option controls the install destination.

"},{"location":"software-libraries/superlu/#superlu_dist_1","title":"SuperLU_DIST","text":"

SuperLU_DIST is configured using Metis and Parmetis, so these should be installed first:

$ ./sh/tpsl/metis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/parmetis.sh --prefix=/path/to/install/location\n$ ./sh/tpsl/superlu_dist.sh --prefix=/path/to/install/location\n
will download, compile, and install the relevant libraries. The install location should be the same for all three packages. See the Archer2 github repository for further options and details.

"},{"location":"software-libraries/superlu/#resources","title":"Resources","text":"

The Supernodal LU project home page

The SuperLU User guide (pdf). This describes both SuperLU and SuperLU_DIST.

The SuperLU github repository

The SuperLU_DIST github repository

"},{"location":"software-libraries/trilinos/","title":"Trilinos","text":"

Trilinos is a large collection of packages with software components that can be used for scientific and engineering problems. Most of the package are released under a BSD license (and some under LGPL).

"},{"location":"software-libraries/trilinos/#compiling-and-linking-against-trilinos","title":"Compiling and linking against Trilinos","text":"

Applications may be built against the module version of Trilinos by using the using the compiler wrappers CC or ftn in the normal way. The appropriate include files and library paths will be inserted automatically. Trilinos is build with OpenPM enabled.

The trilinos module defines the environment variable TRILINOS_DIR as the root of the installation for the current programming environment.

Trilinos also provides a small number of stand-alone executables which are available via the standard PATH mechanism while the module is loaded.

"},{"location":"software-libraries/trilinos/#version-history","title":"Version history","text":"Upgrade 2023Full system4-cabinet system

Note that Trilinos is not currently available for PrgEnv-aocc.

If using AMD compilers, module version aocc/3.0.0 is required.

Known issue

Trilinos is not available in PrgEnv-aocc at the moment.

Known issue

The ForTrilinos package is not available in this version.

Packages enabled are: Amesos, Amesos2, Anasazi, AztecOO Belos Epetra EpretExt FEI Galeri GlobiPack Ifpack Ifpack2 Intrepid Isorropia Kokkos Komplex Mesquite ML Moertel MueLu NOX OptiPack Pamgen Phalanx Piro Pliris ROL RTOp Rythmos Sacado Shards ShyLU STK STKSearch STKTopology STKUtil Stratimikos Teko Teuchos Thyra Tpetra TrilinosCouplings Triutils Xpetra Zoltan Zoltan2

"},{"location":"software-libraries/trilinos/#compiling-trilinos","title":"Compiling Trilinos","text":"

A script which has details of the relevant configuration options for Trilinos is available at the ARCHER2 Github repository. The script will build a static-only version of the libraries.

$ git clone https://github.com/ARCHER2-HPC/pe-scripts.git\n$ cd pe-scripts\n$ git checkout modules-2022-12\n$ ...\n$ ./sh/trilinos.sh --prefix=/path/to/install/location\n
where --prefix sets the installation location. The ellipsis ... is standing for the dependencies used to build Trilinos, which here are: metis, parmetis, superlu, superlu-dist, scotch, mumps, glm, boost. These packages should be built as described in their corresponding pages linked in the menu on the left.

See the ARCHER2 Github repository for further details.

Note that Trilinos may take up to one hour to compile on its own, and so the compilation is best performed as a batch job.

"},{"location":"software-libraries/trilinos/#resources","title":"Resources","text":""},{"location":"user-guide/","title":"User and Best Practice Guide","text":"

The ARCHER2 User and Best Practice Guide covers all aspects of use of the ARCHER2 service. This includes fundamentals (required by all users to use the system effectively), best practice for getting the most out of ARCHER2 and more technical topics.

The User and Best Practice Guide contains the following sections:

"},{"location":"user-guide/analysis/","title":"Data analysis","text":"

As well as being used for scientific simulations, ARCHER2 can also be used for data pre-/post-processing and analysis. This page provides an overview of the different options for doing so.

"},{"location":"user-guide/analysis/#using-the-login-nodes","title":"Using the login nodes","text":"

The easiest way to run non-computationally intensive data analysis is to run directly on the login nodes. However, please remember that the login nodes are a shared resource and should not be used for long-running tasks.

"},{"location":"user-guide/analysis/#example-running-an-r-script-on-a-login-node","title":"Example: Running an R script on a login node","text":"
module load cray-R\nRscript example.R\n
"},{"location":"user-guide/analysis/#using-the-compute-nodes","title":"Using the compute nodes","text":"

If running on the login nodes is not feasible (e.g. due to memory requirements or computationally intensive analysis), the compute nodes can also be used for data analysis.

Important

This is a more expensive option, as you will be charged for using the entire node, even though your analysis may only be using one core.

"},{"location":"user-guide/analysis/#example-running-an-r-script-on-a-compute-node","title":"Example: Running an R script on a compute node","text":"
#!/bin/bash\n#SBATCH --job-name=data_analysis\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load cray-R\n\nRscript example.R\n

An advantage of this method is that you can use Job chaining to automate the process of analysing your output data once your compute job has finished.

"},{"location":"user-guide/analysis/#using-interactive-jobs","title":"Using interactive jobs","text":"

For more interactive analysis, it may be useful to use salloc to reserve a compute node on which to do your analysis. This allows you to run jobs directly on the compute nodes from the command line without using a job submission script. More information on interactive jobs can be found here.

"},{"location":"user-guide/analysis/#example-reserving-a-single-node-for-20-minutes-for-interactive-analysis","title":"Example: Reserving a single node for 20 minutes for interactive analysis","text":"
auser@ln01:> salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 \\\n                --time=00:20:00 --partition=standard --qos=short \\\n                --account=[budget code]\n

Note

If you want to run for longer than 20 minutes, you will need to use a different QoS as the maximum runtime for the short QoS is 20 mins.

"},{"location":"user-guide/analysis/#data-analysis-nodes","title":"Data analysis nodes","text":"

The data analysis nodes on the ARCHER2 system are designed for large compilations, post-calculation analysis and data manipulation. They should be used for jobs which are too small to require a whole compute node, but which would have an adverse impact on the operation of the login nodes if they were run interactively.

Unlike compute nodes, the data analysis nodes are able to access the home, work, and the RDFaaS file systems. They can also be used to transfer data from a remote system to ARCHER2 and vice versa (using e.g. scp or rsync). This can be useful when transferring large amounts of data that might take hours to complete.

"},{"location":"user-guide/analysis/#requesting-resources-on-the-data-analysis-nodes-using-slurm","title":"Requesting resources on the data analysis nodes using Slurm","text":"

The ARCHER2 data analysis nodes can be reached by using the serial partition and the serial QoS. Unlike other nodes on ARCHER2, you may only request part of a single node and you will likely be sharing the node with other users.

The data analysis nodes are set up such that you can specify the number of cores you want to use (up to 32 physical cores) and the amount of memory you want for your job (up to 125 GB). You can have multiple jobs running on the data analysis nodes at the same time, but the total number of cores used by those jobs cannot exceed 32, and the total memory used by jobs currently running from a single user cannot exceed 125 GB -- any jobs above this limit will remain pending until your previous jobs are finished.

You do not need to specify both number of cores and memory for jobs on the data analysis nodes. By default, you will get 1984 MiB of memory per core (which is a little less than 2 GB), when specifying cores only, and 1 core when specifying the memory only.

Note

Each data analysis node is fitted with 512 GB of memory. However, a small amount of this memory is needed for system processes, which is why we set an upper limit of 125 GB per user (a user is limited to one quarter of the RAM on a node). This is also why the per-core default memory allocation is slightly less than 2 GB.

Note

When running on the data analysis nodes, you must always specify either the number of cores you want, the amount of memory you want, or both. The examples shown below specify the number of cores with the --ntasks flag and the memory with the --mem flag. If you are only wanting to specify one of the two, please remember to delete the other one.

"},{"location":"user-guide/analysis/#example-running-a-serial-batch-script-on-the-data-analysis-nodes","title":"Example: Running a serial batch script on the data analysis nodes","text":"

A Slurm batch script for the data analysis nodes looks very similar to one for the compute nodes. The main differences are that you need to use --partition=serial and --qos=serial, specify the number of tasks (rather than the number of nodes) and/or specify the amount of memory you want. For example, to use a single core and 4 GB of memory, you would use something like:

#!/bin/bash\n\n# Slurm job options (job-name, job time)\n#SBATCH --job-name=data_analysis\n#SBATCH --time=0:20:0\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Define memory required for this jobs. By default, you would\n# get just under 2 GB, but you can ask for up to 125 GB.\n#SBATCH --mem=4G\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nmodule load cray-python\n\npython my_analysis_script.py\n
"},{"location":"user-guide/analysis/#interactive-session-on-the-data-analysis-nodes","title":"Interactive session on the data analysis nodes","text":"

There are two ways to start an interactive session on the data analysis nodes: you can either use salloc to reserve a part of a data analysis node for interactive jobs; or, you can use srun to open a terminal on the node and run things on the node directly. You can find out more information on the advantages and disadvantages of both of these methods in the Running jobs on ARCHER2 section of the User and Best Practice Guide.

"},{"location":"user-guide/analysis/#using-salloc-for-interactive-access","title":"Using salloc for interactive access","text":"

You can reserve resources on a data analysis node using salloc. For example, to request 1 core and 4 GB of memory for 20 minutes, you would use:

auser@ln01:~> salloc --time=00:20:00 --partition=serial --qos=serial \\\n                    --account=[budget code] --ntasks=1 \\\n                    --mem=4G\n

When you submit this job, your terminal will display something like:

salloc: Pending job allocation 523113\nsalloc: job 523113 queued and waiting for resources\nsalloc: job 523113 has been allocated resources\nsalloc: Granted job allocation 523113\nsalloc: Waiting for resource configuration\nsalloc: Nodes dvn01 are ready for job\n\nauser@ln01:~>\n

It may take some time for your interactive job to start. Once it runs you will enter a standard interactive terminal session (a new shell). Note that this shell is still on the front end (the prompt has not changed). Whilst the interactive session lasts you will be able to run jobs on the data analysis nodes by issuing the srun command directly at your command prompt. The maximum number of cores and memory you can use is limited by resources requested in the salloc command (or by the defaults if you did not explicitly ask for particular amounts of resource).

Your session will end when you hit the requested walltime. If you wish to finish before this you should use the exit command - this will return you to your prompt before you issued the salloc command.

"},{"location":"user-guide/analysis/#using-srun-for-interactive-access","title":"Using srun for interactive access","text":"

You can get a command prompt directly on the data analysis nodes by using the srun command directly. For example, to reserve 1 core and 8 GB of memory, you would use:

auser@ln01:~> srun   --time=00:20:00 --partition=serial --qos=serial \\\n                    --account=[budget code]    \\\n                    --ntasks=1 --mem=8G \\\n                    --pty /bin/bash\n

The --pty /bin/bash will cause a new shell to be started on the data analysis node. (This is perhaps closer to what many people consider an 'interactive' job than the method using the salloc method described above.)

One can now issue shell commands in the usual way.

When finished, type exit to relinquish the allocation and control will be returned to the front end.

By default, the interactive shell will retain the environment of the parent. If you want a clean shell, remember to specify the --export=none option to the srun command.

"},{"location":"user-guide/analysis/#visualising-data-using-the-data-analysis-nodes-using-x","title":"Visualising data using the data analysis nodes using X","text":"

You can view data on the data analysis nodes by starting an interactive srun session with the --x11 flag to export the X display back to your local system. For 1 core with * GB of memory:

auser@ln01:~> srun   --time=00:20:00 --partition=serial --qos=serial  \\\n                        --hint=nomultithread --account=[budget code]    \\\n                        --ntasks=1 --mem=8G --x11 --pty /bin/bash\n

Tip

Data visualisation on ARCHER2 is only possible if you used the -X or -Y flag to the ssh command when when logging in to the system.

"},{"location":"user-guide/analysis/#using-singularity","title":"Using Singularity","text":"

Singularity can be useful for data analysis, as sites such as DockerHub or SingularityHub contain many pre-built images of data analysis tools that can be simply downloaded and used on ARCHER2. More information about Singularity on ARCHER2 can be found in the Containers section section of the User and Best Practice Guide.

"},{"location":"user-guide/analysis/#data-analysis-tools","title":"Data analysis tools","text":"

Useful tools for data analysis can be found on the Data Analysis and Tools page.

"},{"location":"user-guide/connecting-totp/","title":"Connecting to ARCHER2","text":"

This section covers the basic connection methods.

On the ARCHER2 system, interactive access is achieved using SSH, either directly from a command-line terminal or using an SSH client. In addition, data can be transferred to and from the ARCHER2 system using scp from the command line or by using a file-transfer client.

Before following the process below, we assume you have set up an account on ARCHER2 through the EPCC SAFE. Documentation on how to do this can be found at:

"},{"location":"user-guide/connecting-totp/#command-line-terminal","title":"Command line terminal","text":""},{"location":"user-guide/connecting-totp/#linux","title":"Linux","text":"

Linux distributions include a terminal application that can be used for SSH access to the ARCHER2 login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g., GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

"},{"location":"user-guide/connecting-totp/#macos","title":"MacOS","text":"

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

"},{"location":"user-guide/connecting-totp/#windows","title":"Windows","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend Windows users download and install MobaXterm to access ARCHER2. It is very easy to use and includes an integrated X Server, which allows you to run graphical applications on ARCHER2.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi) and follow the instructions from the Windows Installation Wizard. Note, you might need to have administrator rights to install on some versions of Windows. Also, make sure to check whether Windows Firewall has blocked any features of this program after installation (Windows will warn you if the built-in firewall blocks an action, and gives you the opportunity to override the behaviour).

Once installed, start MobaXterm and then click \"Start local terminal\".

Tips

"},{"location":"user-guide/connecting-totp/#access-credentials","title":"Access credentials","text":"

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password. You can find more detailed instructions on how to set up your credentials to access ARCHER2 from Windows, MacOS and Linux below.

"},{"location":"user-guide/connecting-totp/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access ARCHER2.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting-totp/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE.

Then:

  1. Go to the Menu Login accounts and select the ARCHER2 account you want to add the SSH key to.
  2. On the subsequent Login Account details page, click the Add Credential button.
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key with your account.

Once you have done this, your SSH key will be added to your ARCHER2 account.

"},{"location":"user-guide/connecting-totp/#mfa-time-based-one-time-passcode-totp-code","title":"MFA Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode to log into ARCHER2 so you will also need to set up a method for generating a TOTP code before you can log into ARCHER2.

"},{"location":"user-guide/connecting-totp/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to ARCHER2 after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to ARCHER2 with a new account. When you log into ARCHER2 for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into ARCHER2 from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting-totp/#ssh-clients","title":"SSH Clients","text":"

As noted above, you interact with ARCHER2, over an encrypted communication channel (specifically, Secure Shell version 2 (SSH-2)). This allows command-line access to one of the login nodes of ARCHER2, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers, when used in conjunction with an X Server.

"},{"location":"user-guide/connecting-totp/#logging-in","title":"Logging in","text":"

The login addresses for ARCHER2 are:

You can use the following command from the terminal window to log in to ARCHER2:

Full system
ssh username@login.archer2.ac.uk\n

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you logged into ARCHER2 with your account before the major upgrade in May/June 2023 you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Warning

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_ARCHER2 you would use the command ssh -i keys/id_rsa_ARCHER2 username@login.archer2.ac.uk to log in (or the equivalent for the 4-cabinet system).

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three-step process:

  1. When promoted to enter your ldap password: Re-enter the password you retrieved from SAFE.
  2. When prompted to enter your new password: type in a new password.
  3. When prompted to re-enter the new password: re-enter the new password.

Your password will now have been changed

To allow remote programs, especially graphical applications, to control your local display, such as for a debugger, use:

Full system
ssh -X username@login.archer2.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting-totp/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to ARCHER2. The host keys are checked against the login nodes when you login to ARCHER2 and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the ARCHER2 login nodes.

"},{"location":"user-guide/connecting-totp/#loginarcher2acuk","title":"login.archer2.ac.uk","text":"
login.archer2.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBANu9BQJ1UFr4nwy8X5seIPgCnBl1TKc8XBq2YVY65qS53QcpzjZAH53/CtvyWkyGcmY8/PWsJo9sXHqzXVSkzk=\n\nlogin.archer2.ac.ukssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDFGGByIrskPayB5xRm3vkWoEc5bVtTCi0oTGslD8m+M1Sc/v2IV6FxaEVXGwO9ErQwrtFQRj0KameLS3Jn0LwQ13Tw+vTXV0bsKyGgEu2wW+BSDijGpbxRZXZrg30TltZXd4VkTuWiE6kyhJ6qiIIR0nwfDblijGy3u079gM5Om/Q2wydwh0iAASRzkqldL5bKDb14Vliy7tCT3TJXI49+qIagWUhNEzyN1j2oK/2n3JdflT4/anQ4jUywVG4D1Tor/evEeSa3h5++gbtgAXZaCtlQbBxwckmTetXqnlI+pvkF0AAuS18Bh+hdmvT1+xW0XLv7CMA64HfR93XgQIIuPqFAS1p+HuJkmk4xFAdwrzjnpYAiU5Apkq+vx3W957/LULzZkeiFQY2Y3CY9oPVR8WBmGKXOOBifhl2Hvd51fH1wd0Lw7Zph53NcVSQQhdDUVhgsPJA3M/+UlqoAMEB/V6ESE2z6yrXVfNjDNbbgA1K548EYpyNR8z4eRtZOoi0=\n\nlogin.archer2.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINyptPmidGmIBYHPcTwzgXknVPrMyHptwBgSbMcoZgh5\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting-totp/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to log in or transfer data to ARCHER2 can become tedious as it often has to be repeated several times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make the process more convenient.

Each remote site (or group of sites) can have an entry in this file, which may look something like:

Full system
Host archer2\n    HostName login.archer2.ac.uk\n    User username\n

(remember to replace username with your actual username!).

Taking the full-system example: the Host line defines a short name for the entry. In this case, instead of typing ssh username@login.archer2.ac.uk to access the ARCHER2 login nodes, you could use ssh archer2 instead. The remaining lines define the options for the host.

Now you can use SSH to access ARCHER2 without needing to enter your username or the full hostname every time:

ssh archer2\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config manual page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. For example, you may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Bug

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add that path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting-totp/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect to ARCHER2, there are some simple checks you may use to diagnose the issue, which are described below. If you are having difficulties connecting, we suggest trying these before contacting the ARCHER2 Service Desk.

"},{"location":"user-guide/connecting-totp/#use-the-userloginarcher2acuk-syntax-rather-than-l-user-loginarcher2acuk","title":"Use the user@login.archer2.ac.uk syntax rather than -l user login.archer2.ac.uk","text":"

We have seen a number of instances where people using the syntax

ssh -l user login.archer2.ac.uk\n

have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:

ssh user@login.archer2.ac.uk\n

works more reliably.

"},{"location":"user-guide/connecting-totp/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.archer2.ac.uk, on Linux or MacOS, or ping -n 3 login.archer2.ac.uk on Windows. If you successfully connect to the login node, the output should include:

--- login.archer2.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your Internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting-totp/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey), this may indicate a problem with your SSH key. Some things to check:

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_ARCHER2, use the command chmod 600 id_rsa_ARCHER2.

On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both the public and private key files, as well as the containing folder.

Tip

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions. For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting-totp/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting-totp/#ssh-verbose-output","title":"SSH verbose output","text":"

The verbose-debugging output from ssh can be very useful for diagnosing issues. In particular, it can be used to distinguish between problems with the SSH key and password. To enable verbose output, add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.archer2.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key_hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key_hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key_hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: password, keyboard-interactive\n

In the text above, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated succeeded indicates that the SSH key has been accepted. By default SSH will go through a list of standard private-key files, as well as any you have specified with -i or a config file. To succeed, one of these private keys needs to match to the public key uploaded to SAFE.

If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with Permission denied (publickey). If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.

You should next see something similiar to:

debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (keyboard-interactive).\n

If you do not see the Password: prompt you may have connection issues, or there could be a problem with the ARCHER2 login nodes. If you do not see Authenticated with partial success it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see Authenticated with partial success, it means your password was accepted, and your SSH key will now be checked.

The equivalent information can be obtained in PuTTY by enabling All Logging in settings.

"},{"location":"user-guide/connecting-totp/#related-software","title":"Related Software","text":""},{"location":"user-guide/connecting-totp/#tmux","title":"tmux","text":"

tmux is a multiplexer application available on the ARCHER2 login nodes. It allows for multiple sessions to be open concurrently and these sessions can be detached and run in the background. Furthermore, sessions will continue to run after a user logs off and can be reattached to upon logging in again. It is particularly useful if you are connecting to ARCHER2 on an unstable Internet connection or if you wish to keep an arrangement of terminal applications running while you disconnect your client from the Internet -- for example, when moving between your home and workplace.

"},{"location":"user-guide/connecting/","title":"Connecting to ARCHER2","text":"

This section covers the basic connection methods.

On the ARCHER2 system, interactive access is achieved using SSH, either directly from a command-line terminal or using an SSH client. In addition, data can be transferred to and from the ARCHER2 system using scp from the command line or by using a file-transfer client.

Before following the process below, we assume you have set up an account on ARCHER2 through the EPCC SAFE. Documentation on how to do this can be found at:

"},{"location":"user-guide/connecting/#command-line-terminal","title":"Command line terminal","text":""},{"location":"user-guide/connecting/#linux","title":"Linux","text":"

Linux distributions include a terminal application that can be used for SSH access to the ARCHER2 login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g., GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

"},{"location":"user-guide/connecting/#macos","title":"MacOS","text":"

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

"},{"location":"user-guide/connecting/#windows","title":"Windows","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend Windows users download and install MobaXterm to access ARCHER2. It is very easy to use and includes an integrated X Server, which allows you to run graphical applications on ARCHER2.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi) and follow the instructions from the Windows Installation Wizard. Note, you might need to have administrator rights to install on some versions of Windows. Also, make sure to check whether Windows Firewall has blocked any features of this program after installation (Windows will warn you if the built-in firewall blocks an action, and gives you the opportunity to override the behaviour).

Once installed, start MobaXterm and then click \"Start local terminal\".

Tips

"},{"location":"user-guide/connecting/#access-credentials","title":"Access credentials","text":"

To access ARCHER2, you need to use two sets of credentials: your SSH key pair protected by a passphrase and a Time-based one-time password. You can find more detailed instructions on how to set up your credentials to access ARCHER2 from Windows, MacOS and Linux below.

"},{"location":"user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access ARCHER2.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE.

Then:

  1. Go to the Menu Login accounts and select the ARCHER2 account you want to add the SSH key to.
  2. On the subsequent Login Account details page, click the Add Credential button.
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key with your account.

Once you have done this, your SSH key will be added to your ARCHER2 account.

"},{"location":"user-guide/connecting/#mfa-time-based-one-time-passcode-totp-code","title":"MFA Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode to log into ARCHER2 so you will also need to set up a method for generating a TOTP code before you can log into ARCHER2.

"},{"location":"user-guide/connecting/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to ARCHER2 after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to ARCHER2 with a new account. When you log into ARCHER2 for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into ARCHER2 from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"

As noted above, you interact with ARCHER2, over an encrypted communication channel (specifically, Secure Shell version 2 (SSH-2)). This allows command-line access to one of the login nodes of ARCHER2, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers, when used in conjunction with an X Server.

"},{"location":"user-guide/connecting/#logging-in","title":"Logging in","text":"

The login addresses for ARCHER2 are:

You can use the following command from the terminal window to log in to ARCHER2:

Full system
ssh username@login.archer2.ac.uk\n

The order in which you are asked for credentials depends on the system you are accessing:

Full system

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered this passphrase successfully, you will then be prompted for your machine account password. You need to enter both credentials correctly to be able to access ARCHER2.

Tip

If you logged into ARCHER2 with your account before the major upgrade in May/June 2023 you may see an error from SSH that looks like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nThe ECDSA host key for login.archer2.ac.uk has changed,\nand the key for the corresponding IP address 193.62.216.43\nhas a different value. This could either mean that\nDNS SPOOFING is happening or the IP address for the host\nand its host key have changed at the same time.\nOffending key for IP in /Users/auser/.ssh/known_hosts:11\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\nIt is also possible that a host key has just been changed.\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:UGS+LA8I46LqnD58WiWNlaUFY3uD1WFr+V8RCG09fUg.\nPlease contact your system administrator.\n

If you see this, you should delete the offending host key from your ~/.ssh/known_hosts file (in the example above the offending line is line #11)

Warning

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key wih the -i option to ssh. For example, if your key is in a file called keys/id_rsa_ARCHER2 you would use the command ssh -i keys/id_rsa_ARCHER2 username@login.archer2.ac.uk to log in (or the equivalent for the 4-cabinet system).

Tip

When you first log into ARCHER2, you will be prompted to change your initial password. This is a three-step process:

  1. When promoted to enter your ldap password: Re-enter the password you retrieved from SAFE.
  2. When prompted to enter your new password: type in a new password.
  3. When prompted to re-enter the new password: re-enter the new password.

Your password will now have been changed

To allow remote programs, especially graphical applications, to control your local display, such as for a debugger, use:

Full system
ssh -X username@login.archer2.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to ARCHER2. The host keys are checked against the login nodes when you login to ARCHER2 and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the ARCHER2 login nodes.

"},{"location":"user-guide/connecting/#loginarcher2acuk","title":"login.archer2.ac.uk","text":"
login.archer2.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBANu9BQJ1UFr4nwy8X5seIPgCnBl1TKc8XBq2YVY65qS53QcpzjZAH53/CtvyWkyGcmY8/PWsJo9sXHqzXVSkzk=\n\nlogin.archer2.ac.ukssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDFGGByIrskPayB5xRm3vkWoEc5bVtTCi0oTGslD8m+M1Sc/v2IV6FxaEVXGwO9ErQwrtFQRj0KameLS3Jn0LwQ13Tw+vTXV0bsKyGgEu2wW+BSDijGpbxRZXZrg30TltZXd4VkTuWiE6kyhJ6qiIIR0nwfDblijGy3u079gM5Om/Q2wydwh0iAASRzkqldL5bKDb14Vliy7tCT3TJXI49+qIagWUhNEzyN1j2oK/2n3JdflT4/anQ4jUywVG4D1Tor/evEeSa3h5++gbtgAXZaCtlQbBxwckmTetXqnlI+pvkF0AAuS18Bh+hdmvT1+xW0XLv7CMA64HfR93XgQIIuPqFAS1p+HuJkmk4xFAdwrzjnpYAiU5Apkq+vx3W957/LULzZkeiFQY2Y3CY9oPVR8WBmGKXOOBifhl2Hvd51fH1wd0Lw7Zph53NcVSQQhdDUVhgsPJA3M/+UlqoAMEB/V6ESE2z6yrXVfNjDNbbgA1K548EYpyNR8z4eRtZOoi0=\n\nlogin.archer2.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINyptPmidGmIBYHPcTwzgXknVPrMyHptwBgSbMcoZgh5\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to log in or transfer data to ARCHER2 can become tedious as it often has to be repeated several times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make the process more convenient.

Each remote site (or group of sites) can have an entry in this file, which may look something like:

Full system
Host archer2\n    HostName login.archer2.ac.uk\n    User username\n

(remember to replace username with your actual username!).

Taking the full-system example: the Host line defines a short name for the entry. In this case, instead of typing ssh username@login.archer2.ac.uk to access the ARCHER2 login nodes, you could use ssh archer2 instead. The remaining lines define the options for the host.

Now you can use SSH to access ARCHER2 without needing to enter your username or the full hostname every time:

ssh archer2\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config manual page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. For example, you may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Bug

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add that path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect to ARCHER2, there are some simple checks you may use to diagnose the issue, which are described below. If you are having difficulties connecting, we suggest trying these before contacting the ARCHER2 Service Desk.

"},{"location":"user-guide/connecting/#use-the-userloginarcher2acuk-syntax-rather-than-l-user-loginarcher2acuk","title":"Use the user@login.archer2.ac.uk syntax rather than -l user login.archer2.ac.uk","text":"

We have seen a number of instances where people using the syntax

ssh -l user login.archer2.ac.uk\n

have not been able to connect properly and get prompted for a password many times. We have found that using the alternative syntax:

ssh user@login.archer2.ac.uk\n

works more reliably.

"},{"location":"user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.archer2.ac.uk, on Linux or MacOS, or ping -n 3 login.archer2.ac.uk on Windows. If you successfully connect to the login node, the output should include:

--- login.archer2.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your Internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey), this may indicate a problem with your SSH key. Some things to check:

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_ARCHER2, use the command chmod 600 id_rsa_ARCHER2.

On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both the public and private key files, as well as the containing folder.

Tip

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute. The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions. For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"

The verbose-debugging output from ssh can be very useful for diagnosing issues. In particular, it can be used to distinguish between problems with the SSH key and password. To enable verbose output, add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.archer2.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key_hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg rsa-sha2-512 blen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key_hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key_hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\ndebug1: Authentications that can continue: password, keyboard-interactive\n

In the text above, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated succeeded indicates that the SSH key has been accepted. By default SSH will go through a list of standard private-key files, as well as any you have specified with -i or a config file. To succeed, one of these private keys needs to match to the public key uploaded to SAFE.

If your SSH key passphrase is incorrect, you will be asked to try again up to three times in total, before being disconnected with Permission denied (publickey). If you enter your passphrase correctly, but still see this error message, please consider the advice under SSH key above.

You should next see something similiar to:

debug1: Next authentication method: keyboard-interactive\ndebug2: userauth_kbdint\ndebug3: send packet: type 50\ndebug2: we sent a keyboard-interactive packet, wait for reply\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 1\nPassword:\ndebug3: send packet: type 61\ndebug3: receive packet: type 60\ndebug2: input_userauth_info_req\ndebug2: input_userauth_info_req: num_prompts 0\ndebug3: send packet: type 61\ndebug3: receive packet: type 52\ndebug1: Authentication succeeded (keyboard-interactive).\n

If you do not see the Password: prompt you may have connection issues, or there could be a problem with the ARCHER2 login nodes. If you do not see Authenticated with partial success it means your password was not accepted. You will be asked to re-enter your password, usually two more times before the connection will be rejected. Consider the suggestions under Password above. If you do see Authenticated with partial success, it means your password was accepted, and your SSH key will now be checked.

The equivalent information can be obtained in PuTTY by enabling All Logging in settings.

"},{"location":"user-guide/connecting/#related-software","title":"Related Software","text":""},{"location":"user-guide/connecting/#tmux","title":"tmux","text":"

tmux is a multiplexer application available on the ARCHER2 login nodes. It allows for multiple sessions to be open concurrently and these sessions can be detached and run in the background. Furthermore, sessions will continue to run after a user logs off and can be reattached to upon logging in again. It is particularly useful if you are connecting to ARCHER2 on an unstable Internet connection or if you wish to keep an arrangement of terminal applications running while you disconnect your client from the Internet -- for example, when moving between your home and workplace.

"},{"location":"user-guide/containers/","title":"Containers","text":"

This page was originally based on the documentation at the University of Sheffield HPC service

Designed around the notion of mobility of compute and reproducible science, Singularity enables users to have full control of their operating system environment. This means that a non-privileged user can \"swap out\" the Linux operating system and environment on the host for a Linux OS and environment that they control. So if the host system is running CentOS Linux but your application runs in Ubuntu Linux with a particular software stack, you can create an Ubuntu image, install your software into that image, copy the image to another host (e.g. ARCHER2), and run your application on that host in its native Ubuntu environment.

Singularity also allows you to leverage the resources of whatever host you are on. This includes high-speed interconnects (e.g. Slingshot on ARCHER2), file systems (e.g. /home and /work on ARCHER2) and potentially other resources.

Note

Singularity only supports Linux containers. You cannot create images that use Windows or macOS (this is a restriction of the containerisation model rather than Singularity).

"},{"location":"user-guide/containers/#useful-links","title":"Useful Links","text":""},{"location":"user-guide/containers/#about-singularity-containers-images","title":"About Singularity Containers (Images)","text":"

Similar to Docker, a Singularity container is a self-contained software stack. As Singularity does not require a root-level daemon to run its containers (as is required by Docker) it is suitable for use on multi-user HPC systems such as ARCHER2. Within the container, you have exactly the same permissions as you do in a standard login session on the system.

In practice, this means that a container image created on your local machine with all your research software installed for local development will also run on ARCHER2.

Pre-built container images (such as those on DockerHub or SingularityHub archive can simply be downloaded and used on ARCHER2 (or anywhere else Singularity is installed).

Creating and modifying container images requires root permission and so must be done on a system where you have such access (in practice, this is usually within a virtual machine on your laptop/workstation).

Note

SingularityHub was a publicly available cloud service for Singularity container images active from 2016 to 2021. It built container recipes from Github repositories on Google Cloud, and container images were available via the command line Singularity or sregistry software. These container images are still available now in the SingularityHub Archive

"},{"location":"user-guide/containers/#using-singularity-images-on-archer2","title":"Using Singularity Images on ARCHER2","text":"

Singularity containers can be used on ARCHER2 in a number of ways, including:

We provide information on each of these scenarios below. First, we describe briefly how to get existing container images onto ARCHER2 so that you can launch containers based on them.

"},{"location":"user-guide/containers/#getting-existing-container-images-onto-archer2","title":"Getting existing container images onto ARCHER2","text":"

Singularity container images are files, so, if you already have a container image, you can use scp to copy the file to ARCHER2 as you would with any other file.

If you wish to get a file from one of the container image repositories, then Singularity allows you to do this from ARCHER2 itself.

For example, to retrieve a container image from SingularityHub on ARCHER2 we can simply issue a Singularity command to pull the image.

auser@ln03:~> singularity pull hello-world.sif shub://vsoch/hello-world\n

The container image located at the shub URI is written to a Singularity Image File (SIF) called hello-world.sif.

"},{"location":"user-guide/containers/#interactive-use-on-the-login-nodes","title":"Interactive use on the login nodes","text":"

Once you have a container image file, launching a container based on the container image on the login nodes in an interactive way is extremely simple: you use the singularity shell command. Using the container image we built in the example above:

auser@ln03:~> singularity shell hello-world.sif\nSingularity>\n

Within a Singularity container your home directory will be available.

Once you have finished using your container, you can return to the ARCHER2 login node prompt with the exit command:

Singularity> exit\nexit\nauser@ln03:~>\n
"},{"location":"user-guide/containers/#interactive-use-on-the-compute-nodes","title":"Interactive use on the compute nodes","text":"

The process for using a container interactively on the compute nodes is very similar to that for the login nodes. The only difference is that you first have to submit an interactive serial job (from a location on /work) in order to get interactive access to the compute node.

For example, to reserve a full node for you to work on interactively you would use:

auser@ln03:/work/t01/t01/auser> srun --nodes=1 --exclusive --time=00:20:00 \\\n                                      --account=[budget code] \\\n                                      --partition=standard --qos=standard \\\n                                      --pty /bin/bash\n\n...wait until job starts...\n\nauser@nid00001:/work/t01/t01/auser>\n

Note that the prompt has changed to show you are on a compute node. Now you can launch a container in the same way as on the login node.

auser@nid00001:/work/t01/t01/auser> singularity shell hello-world.sif\nSingularity> exit\nexit\nauser@nid00001:/work/t01/t01/auser> exit\nauser@ln03:/work/t01/t01/auser>\n

Note

We used exit to leave the interactive container shell and then exit again to leave the interactive job on the compute node.

"},{"location":"user-guide/containers/#serial-processes-within-a-non-interactive-batch-script","title":"Serial processes within a non-interactive batch script","text":"

You can also use Singularity containers within a non-interactive batch script as you would any other command. If your container image contains a runscript then you can use singularity run to execute the runscript in the job. You can also use singularity exec to execute arbitrary commands (or scripts) within the container.

An example job submission script to run a serial job that executes the runscript within a container based on the container image in the hello-world.sif file that we downloaded previously to an ARCHER2 login node would be as follows.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n\n#SBATCH --job-name=helloworld\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:10:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Run the serial executable\nsingularity run $SLURM_SUBMIT_DIR/hello-world.sif\n

You submit this in the usual way and the standard output and error should be written to slurm-..., where the output filename ends with the job number.

"},{"location":"user-guide/containers/#parallel-processes-within-a-non-interactive-batch-script","title":"Parallel processes within a non-interactive batch script","text":"

Running a Singularity container in parallel across a number of compute nodes requires some preparation. In general though, Singularity can be run within the parallel job launcher (srun).

srun <options> \\\n    singularity <options> /path/to/image/file \\\n        app <options>\n

The code snippet above shows the launch command as having three nested parts, srun, the singularity environment and the containerised application.

The Singularity container image must be compatible with the MPI environment on the host; either, the containerised app has been built against the appropriate MPI libraries or the container itself contains an MPI library that is compatible with the host MPI. The latter situation is known as the hybrid model; this is the approach taken in the sections that follow.

"},{"location":"user-guide/containers/#creating-your-own-singularity-container-images","title":"Creating Your Own Singularity Container Images","text":"

As we saw above, you can create Singularity container images by importing from DockerHub or Singularity Hub on ARCHER2 itself. If you wish to create your own custom container image to use with Singularity then you must use a system where you have root (or administrator) privileges - often your own laptop or workstation.

There are a number of different options to create container images on your local system to use with Singularity on ARCHER2. We are going to use Docker on our local system to create the container image, push the new container image to Docker Hub and then use Singularity on ARCHER2 to convert the Docker container image to a Singularity container image SIF file.

For macOS and Windows users we recommend installing Docker Desktop. For Linux users, we recommend installing Docker directly on your local system. See the Docker documentation for full details on how to install Docker Desktop/Docker.

"},{"location":"user-guide/containers/#building-container-images-using-docker","title":"Building container images using Docker","text":"

Note

We assume that you are familiar with using Docker in these instructions. You can find an introduction to Docker at Reproducible Computational Environments Using Containers: Introduction to Docker

As usual, you can build container images with a command similar to:

docker build --platform linux/amd64 -t <username>/<image name>:<version> .\n

Where:

Note, you should use the --platform linux/amd64 option to ensure that the container image is compatible with the processor architecture on ARCHER2.

"},{"location":"user-guide/containers/#using-singularity-with-mpi-on-archer2","title":"Using Singularity with MPI on ARCHER2","text":"

MPI on ARCHER2 is provided by the Cray MPICH libraries with the interface to the high-performance Slingshot interconnect provided via the OFI interface. Therefore, as per the Singularity MPI Hybrid model, we will build our container image such that it contains a version of the MPICH MPI library compiled with support for OFI. Below, we provide instructions on creating a container image with a version of MPICH compiled in this way. We then provide an example of how to run a Singularity container with MPI over multiple ARCHER2 compute nodes.

"},{"location":"user-guide/containers/#building-an-image-with-mpi-from-scratch","title":"Building an image with MPI from scratch","text":"

Warning

Remember, all these steps should be executed on your local system where you have administrator privileges and Docker installed, not on ARCHER2.

We will illustrate the process of building a Singularity image with MPI from scratch by building an image that contains MPI provided by MPICH and the OSU MPI benchmarks. As part of the container image creation we need to download the source code for both MPICH and the OSU benchmarks. At the time of writing, the stable MPICH release is 3.4.2 and the stable OSU benchmark release is 5.8 - this may have changed by the time you are following these instructions.

First, create a Dockerfile that describes how to build the image:

FROM ubuntu:20.04\n\nENV DEBIAN_FRONTEND=noninteractive\n\n# Install the necessary packages (from repo)\nRUN apt-get update && apt-get install -y --no-install-recommends \\\n apt-utils \\\n build-essential \\\n curl \\\n libcurl4-openssl-dev \\\n libzmq3-dev \\\n pkg-config \\\n software-properties-common\nRUN apt-get clean\nRUN apt-get install -y dkms\nRUN apt-get install -y autoconf automake build-essential numactl libnuma-dev autoconf automake gcc g++ git libtool\n\n# Download and build an ABI compatible MPICH\nRUN curl -sSLO http://www.mpich.org/static/downloads/3.4.2/mpich-3.4.2.tar.gz \\\n   && tar -xzf mpich-3.4.2.tar.gz -C /root \\\n   && cd /root/mpich-3.4.2 \\\n   && ./configure --prefix=/usr --with-device=ch4:ofi --disable-fortran \\\n   && make -j8 install \\\n   && rm -rf /root/mpich-3.4.2 \\\n   && rm /mpich-3.4.2.tar.gz\n\n# OSU benchmarks\nRUN curl -sSLO http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.4.1.tar.gz \\\n   && tar -xzf osu-micro-benchmarks-5.4.1.tar.gz -C /root \\\n   && cd /root/osu-micro-benchmarks-5.4.1 \\\n   && ./configure --prefix=/usr/local CC=/usr/bin/mpicc CXX=/usr/bin/mpicxx \\\n   && cd mpi \\\n   && make -j8 install \\\n   && rm -rf /root/osu-micro-benchmarks-5.4.1 \\\n   && rm /osu-micro-benchmarks-5.4.1.tar.gz\n\n# Add the OSU benchmark executables to the PATH\nENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:$PATH\nENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/collective:$PATH\n\n# path to mlx libraries in Ubuntu\nENV LD_LIBRARY_PATH=/usr/lib/libibverbs:$LD_LIBRARY_PATH\n

A quick overview of what the above Dockerfile is doing:

Now we can go ahead and build the container image using Docker (this assumes that you issue the command in the same directory as the Dockerfile you created based on the specification above):

docker build --platform linux/amd64 -t auser/osu-benchmarks:5.4.1 .\n

(Remember to change auser to your Dockerhub username.)

Once you have successfully built your container image, you should push it to Dockerhub:

docker push auser/osu-benchmarks:5.4.1\n

Finally, you need to use Singularity on ARCHER2 to convert the Docker container image to a Singularity container image file. Log into ARCHER2, move to the work file system and then use a command like:

auser@ln01:/work/t01/t01/auser> singularity build osu-benchmarks_5.4.1.sif docker://auser/osu-benchmarks:5.4.1\n

Tip

You can find a copy of the osu-benchmarks_5.4.1.sif image on ARCHER2 in the directory $EPCC_SINGULARITY_DIR if you do not want to build it yourself but still want to test.

"},{"location":"user-guide/containers/#running-parallel-mpi-jobs-using-singularity-containers","title":"Running parallel MPI jobs using Singularity containers","text":"

Tip

These instructions assume you have built a Singularity container image file on ARCHER2 that includes MPI provided by MPICH with the OFI interface. See the sections above for how to build such container images.

Once you have built your Singularity container image file that includes MPICH built with OFI for ARCHER2, you can use it to run parallel jobs in a similar way to non-Singularity jobs. The example job submission script below uses the container image file we built above with MPICH and the OSU benchmarks to run the Allreduce benchmark on two nodes where all 128 cores on each node are used for MPI processes (so, 256 MPI processes in total).

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=singularity_parallel\n#SBATCH --time=0:10:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --account=[budget code]\n\n# Load the module to make the Cray MPICH ABI available\nmodule load cray-mpich-abi\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n#\u00a0Set the LD_LIBRARY_PATH environment variable within the Singularity container\n# to ensure that it used the correct MPI libraries.\nexport SINGULARITYENV_LD_LIBRARY_PATH=\"/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib:/opt/cray/libfabric/1.12.1.2.2.0.0/lib64:/opt/cray/pe/gcc-libs:/opt/cray/pe/gcc-libs:/opt/cray/pe/lib64:/opt/cray/pe/lib64:/opt/cray/xpmem/default/lib64:/usr/lib64/libibverbs:/usr/lib64:/usr/lib64\"\n\n# This makes sure HPE Cray Slingshot interconnect libraries are available\n# from inside the container.\nexport SINGULARITY_BIND=\"/opt/cray,/var/spool,/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib,/etc/host.conf,/etc/libibverbs.d/mlx5.driver,/etc/libnl/classid,/etc/resolv.conf,/opt/cray/libfabric/1.12.1.2.2.0.0/lib64/libfabric.so.1,/opt/cray/pe/gcc-libs/libatomic.so.1,/opt/cray/pe/gcc-libs/libgcc_s.so.1,/opt/cray/pe/gcc-libs/libgfortran.so.5,/opt/cray/pe/gcc-libs/libquadmath.so.0,/opt/cray/pe/lib64/libpals.so.0,/opt/cray/pe/lib64/libpmi2.so.0,/opt/cray/pe/lib64/libpmi.so.0,/opt/cray/xpmem/default/lib64/libxpmem.so.0,/run/munge/munge.socket.2,/usr/lib64/libibverbs/libmlx5-rdmav34.so,/usr/lib64/libibverbs.so.1,/usr/lib64/libkeyutils.so.1,/usr/lib64/liblnetconfig.so.4,/usr/lib64/liblustreapi.so,/usr/lib64/libmunge.so.2,/usr/lib64/libnl-3.so.200,/usr/lib64/libnl-genl-3.so.200,/usr/lib64/libnl-route-3.so.200,/usr/lib64/librdmacm.so.1,/usr/lib64/libyaml-0.so.2\"\n\n# Launch the parallel job.\nsrun --hint=nomultithread --distribution=block:block \\\n    singularity run osu-benchmarks_5.4.1.sif \\\n        osu_allreduce\n

The only changes from a standard submission script are:

Important

Remember that the image file must be located on /work to run jobs on the compute nodes.

If the job runs correctly, you should see output similar to the following in your slurm-*.out file:

Lmod is automatically replacing \"cray-mpich/8.1.23\" with\n\"cray-mpich-abi/8.1.23\".\n\n\n# OSU MPI Allreduce Latency Test v5.4.1\n# Size       Avg Latency(us)\n4                       7.93\n8                       7.93\n16                      8.13\n32                      8.69\n64                      9.54\n128                    13.75\n256                    17.04\n512                    25.94\n1024                   29.43\n2048                   43.53\n4096                   46.53\n8192                   46.20\n16384                  55.85\n32768                  83.11\n65536                 136.90\n131072                257.13\n262144                486.50\n524288               1025.87\n1048576              2173.25\n
"},{"location":"user-guide/containers/#using-containerised-hpe-cray-programming-environments","title":"Using Containerised HPE Cray Programming Environments","text":"

An experimental containerised CPE module has been setup on ARCHER2. The module is not available by default but can be made accessible by running module use with the right path.

module use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n

The purpose of the ccpe module(s) is to allow developers to check that their code compiles with the latest Cray Programming Environment (CPE) releases. The CPE release installed on ARCHER2 (currently CPE 22.12) will typically be older than the latest available. A more recent containerised CPE therefore gives developers the opportunity to try out the latest compilers and libraries before the ARCHER CPE is upgraded.

Note

The Containerised CPEs support CCE and GCC compilers, but not AOCC compilers.

The ccpe/23.12 module then provides access to CPE 23.12 via a Singularity image file, located at /work/y07/shared/utils/dev/ccpe/23.12/cpe_23.12.sif. Singularity containers can be run such that locations on the host file system are still visible. This means source code stored on /work can be compiled from inside the CPE container. And any output resulting from the compilation, such as object files, libraries and executables, can be written to /work also. This ability to bind to locations on the host is necessary as the container is immutable, i.e., you cannot write files to the container itself.

Any executable resulting from a containerised CPE build can be run from within the container, allowing the developer to test the performance of the containerised libraries, e.g., libmpi_cray, libpmi2, libfabric.

We'll now show how to build and run a simple Hello World MPI example using a containerised CPE.

First, cd to the directory containing the Hello World MPI source, makefile and build script. Examples of these files are given below.

build.shmakefilehelloworld.f90
#!/bin/bash\n\nmake clean\nmake\n\necho -e \"\\n\\nldd helloworld\"\nldd helloworld\n
MF=     Makefile\n\nFC=     ftn\nFFLAGS= -O3\nLFLAGS= -lmpichf90\n\nEXE=    helloworld\nFSRC=   helloworld.f90\n\n#\n# No need to edit below this line\n#\n\n.SUFFIXES:\n.SUFFIXES: .f90 .o\n\nOBJ=    $(FSRC:.f90=.o)\n\n.f90.o:\n    $(FC) $(FFLAGS) -c $<\n\nall:    $(EXE)\n\n$(EXE): $(OBJ)\n    $(FC) $(FFLAGS) -o $@ $(OBJ) $(LFLAGS)\n\nclean:\n    rm -f $(OBJ) $(EXE) core\n
!\n! Prints 'Hello World' from rank 0 and\n! prints what processor it is out of the total number of processors from\n! all ranks\n!\n\nprogram helloworld\n  use mpi\n\n  implicit none\n\n  integer :: comm, rank, size, ierr\n  integer :: last_arg\n\n  comm = MPI_COMM_WORLD\n\n  call MPI_INIT(ierr)\n\n  call MPI_COMM_RANK(comm, rank, ierr)\n  call MPI_COMM_SIZE(comm, size, ierr)\n\n  ! Each process prints out its rank\n  write(*,*) 'I am ', rank, 'out of ', size,' processors.'\n\n  call sleep(1)\n\n  call MPI_FINALIZE(ierr)\n\nend program helloworld\n

The ldd command at the end of the build script is simply there to confirm that the code is indeed linked to containerised libraries that form part of the CPE 23.12 release.

The next step is to launch a job (via sbatch) on a serial node that instantiates the containerised CPE 23.12 image and builds the Hello World MPI code.

submit-build.slurm
#!/bin/bash\n\n#SBATCH --job-name=ccpe-build\n#SBATCH --ntasks=8\n#SBATCH --time=00:10:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nBUILD_CMD=\"${CCPE_BUILDER} ${SLURM_SUBMIT_DIR}/build.sh\"\n\nsingularity exec --cleanenv \\\n    --bind ${CCPE_BIND_ARGS},${SLURM_SUBMIT_DIR} --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n    ${CCPE_IMAGE_FILE} ${BUILD_CMD}\n

The CCPE environment variables shown above (e.g., CCPE_BUILDER and CCPE_IMAGE_FILE) are set by the loading of the ccpe/23.12 module. The CCPE_BUILDER variable holds the path to the script that prepares the containerised environment prior to running the build.sh script. You can run cat ${CCPE_BUILDER} to take a closer look at what is going on.

Note

Passing the ${SLURM_SUBMIT_DIR} path to Singularity via the --bind option allows the CPE container to access the source code and write out the executable using locations on the host.

Running the newly-built code is similarly straightforward; this time the containerised CPE is launched on the compute nodes using the srun command.

submit-run.slurm
#!/bin/bash\n\n#SBATCH --job-name=helloworld\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nRUN_CMD=\"${SLURM_SUBMIT_DIR}/helloworld\"\n\nsrun --distribution=block:block --hint=nomultithread --chdir=${SLURM_SUBMIT_DIR} \\\n    singularity exec --bind ${CCPE_BIND_ARGS},${SLURM_SUBMIT_DIR} --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n        ${CCPE_IMAGE_FILE} ${RUN_CMD}\n

If you wish you can at runtime replace a containerised library with its host equivalent. You may for example decide to do this for a low-level communications library such as libfabric or libpmi. This can be done by adding (before the srun command) something like the following line to the submit-run.slurm file.

source ${CCPE_SET_HOST_PATH} \"/opt/cray/pe/pmi\" \"6.1.8\" \"lib\"\n

As of April 2024, the version of PMI available on ARCHER2 is 6.1.8 (CPE 22.12), and so the command above would allow you to isolate the impact of the containerised PMI library, which for CPE 23.12 is PMI 6.1.13. To see how the setting of the host library is done, simply run cat ${CCPE_SET_HOST_PATH} after loading the ccpe module.

An MPI code that just prints a message from each rank is obviously very simple. Real-world codes such as CP2K or GROMACS will often require additional software for compilation, e.g., Intel MKL libraries or tools that control the build process such as CMake. The way round this sort of problem is to point the CCPE container at the locations on the host where the software is installed.

submit-cmake-build.slurm
#!/bin/bash\n\n#SBATCH --job-name=ccpe-build\n#SBATCH --ntasks=8\n#SBATCH --time=00:10:00\n#SBATCH --account=<budget code>\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n#SBATCH --export=none\n\nexport OMP_NUM_THREADS=1\n\nmodule use /work/y07/shared/archer2-lmod/others/dev\nmodule load ccpe/23.12\n\nCMAKE_DIR=\"/work/y07/shared/utils/core/cmake/3.21.3\"\n\nBUILD_CMD=\"${CCPE_BUILDER} ${SLURM_SUBMIT_DIR}/build.sh\"\n\nsingularity exec --cleanenv \\\n    --bind ${CCPE_BIND_ARGS},${CMAKE_DIR},${SLURM_SUBMIT_DIR} \\\n    --env LD_LIBRARY_PATH=${CCPE_LD_LIBRARY_PATH} \\\n    ${CCPE_IMAGE_FILE} ${BUILD_CMD}\n

The submit-cmake-build.slurm script shows how the --bind option can be used to make the CMake installation on ARCHER2 accessible from within the container. The build.sh script can then call the cmake command directly (once the CMake bin directory has been added to the PATH environment variable).

"},{"location":"user-guide/data-migration/","title":"Data migration from ARCHER to ARCHER2","text":"

This content has been moved to archer-migration/data-migration

"},{"location":"user-guide/data/","title":"Data management and transfer","text":"

This section covers best practice and tools for data management on ARCHER2 along with a description of the different storage available on the service.

The IO section has information on achieving good performance for reading and writing data to the ARCHER2 storage along with information and advice on different IO patterns.

Information

If you have any questions on data management and transfer please do not hesitate to contact the ARCHER2 service desk at support@archer2.ac.uk.

"},{"location":"user-guide/data/#useful-resources-and-links","title":"Useful resources and links","text":""},{"location":"user-guide/data/#data-management","title":"Data management","text":"

We strongly recommend that you give some thought to how you use the various data storage facilities that are part of the ARCHER2 service. This will not only allow you to use the machine more effectively but also to ensure that your valuable data is protected.

Here are the main points you should consider:

"},{"location":"user-guide/data/#archer2-storage","title":"ARCHER2 storage","text":"

The ARCHER2 service, like many HPC systems, has a complex structure. There are a number of different data storage types available to users:

Each type of storage has different characteristics and policies, and is suitable for different types of use.

Important

All users have a directory on one of the home file systems and on one of the work file systems. The directories are located at:

There are also three different types of node available to users:

Each type of node sees a different combination of the storage types. The following table shows which storage options are avalable on different node types:

Storage Login Nodes Compute Nodes Data analysis nodes Notes /home yes no yes Incremental backup /work yes yes yes No backup, high performance Solid state (NVMe) yes yes yes No backup, high performance RDFaaS yes no yes Disaster recovery backup

Important

Only the work file systems and the solid state (NVMe) file system are visible on the compute nodes. This means that all data required by calculations at runtime (input data, application binaries, software libraries, etc.) must be placed on one of these file systems.

You may see \"file not found\" errors if you try to access data on the /home or RDFaaS file systems when running on the compute nodes.

"},{"location":"user-guide/data/#home-file-systems","title":"Home file systems","text":"

There are four independent home file-systems. Every project has an allocation on one of the four. You do not need to know which one your project uses as your projects space can always be accessed via the path /home/[project ID] with your personal directory at /home/[project ID]/[project ID]/[user ID]. Each home file-system is approximately 100 TB in size and is implemented using standard Network Attached Storage (NAS) technology. This means that these disks are not particularly high performance but are well suited to standard operations like compilation and file editing. These file systems are visible from the ARCHER2 login nodes.

"},{"location":"user-guide/data/#accessing-snapshots-of-home-file-systems","title":"Accessing snapshots of home file systems","text":"

The home file systems are fully backed up. The home file systems retain snapshots which can be used to recover past versions of files. Snapshots are taken weekly (for each of the past two weeks), daily (for each of the past two days) and hourly (for each of the last 6 hours). You can access the snapshots at .snapshot from any given directory on the home file systems. Note that the .snapshot directory will not show up under any version of \u201cls\u201d and will not tab complete.

These file systems are a good location to keep source code, copies of scripts and compiled binaries. Small amounts of important data can also be copied here for safe keeping though the file systems are not fast enough to manipulate large datasets effectively.

"},{"location":"user-guide/data/#quotas-on-home-file-systems","title":"Quotas on home file systems","text":"

All projects are assigned a quota on the home file systems. The project PI or manager can split this quota up between users or groups of users if they wish.

You can view any home file system quotas that apply to your account by logging into SAFE and navigating to the page for your ARCHER2 login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your ARCHER2 login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Tip

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the systems themselves.

"},{"location":"user-guide/data/#work-file-systems","title":"Work file systems","text":"

There are currently three work file systems on the full ARCHER2 service. Each of these file systems is 3.4 PB and a portion of one of these file systems is available to each project. You do not usually need to know which one your project uses as your projects space can always be accessed via the path /work/[project ID] with your personal directory at /work/[project ID]/[project ID]/[user ID].

All of these are high-performance, Lustre parallel file systems. They are designed to support data in large files. The performance for data stored in large numbers of small files is probably not going to be as good.

These file systems are available on the compute nodes and are the default location users should use for data required at runtime on the compute nodes.

Warning

There are no backups of any data on the work file systems. You should not rely on these file systems for long term storage.

Ideally, these file systems should only contain data that is:

In practice it may be convenient to keep copies of datasets on the work file systems that you know will be needed at a later date. However, make sure that important data is always backed up elsewhere and that your work would not be significantly impacted if the data on the work file systems was lost.

Large data sets can be moved to the RDFaaS storage or transferred off the ARCHER2 service entirely.

If you have data on the work file systems that you are not going to need in the future please delete it.

"},{"location":"user-guide/data/#quotas-on-the-work-file-systems","title":"Quotas on the work file systems","text":"

As for the home file systems, all projects are assigned a quota on the work file systems. The project PI or manager can split this quota up between users or groups of users if they wish.

You can view any work file system quotas that apply to your account by logging into SAFE and navigating to the page for your ARCHER2 login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your ARCHER2 login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Tip

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the systems themselves.

You can also examine up to date quotas and usage on the ARCHER2 systems themselves using the lfs quota command. To do this:

cd /work/t01/t01/auser\n
auser@ln03:/work/t01/t01/auser> lfs quota -hu auser .\nDisk quotas for usr auser (uid 5496):\n  Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  1.366G      0k      0k       -    5486       0       0       -\nuid 5496 is using default block quota setting\nuid 5496 is using default file quota setting\n

the quota and limit of 0k here indicate that no user quota is set for this user

auser@ln03:/work/t01/t01/auser> lfs quota -hp $(id -g) .\nDisk quotas for prj 1009 (pid 1009):\n  Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  2.905G      0k      0k       -   25300       0       0       -\npid 1009 is using default block quota setting\npid 1009 is using default file quota setting\n
"},{"location":"user-guide/data/#solid-state-nvme-file-system-scratch-storage","title":"Solid state (NVMe) file system - scratch storage","text":"

Important

The solid state storage system is configured as scratch storage with all files that have not been accessed in the last 28 days being automatically deleted. This implementation starts on 28 Feb 2024, i.e. any files not accessed since 1 Feb 2024 will be automatically removed on 28 Feb 2024.

The solid state storage file system is a 1 PB high performance parallel Lustre file system similar to the work file systems. However, unlike the work file systems, all of the disks are based solid state storage (NVMe) technology. This changes the performance characteristics of the file system compared to the work file systems. Testing by the ARCHER2 CSE team at EPCC has shown that you may see I/O performance improvements from the solid state storage compared to the standard work Lustre file systems on ARCHER2 if your I/O model has the following characteristics or similar:

Data on the solid state (NVMe) file system is visible on the compute nodes

Important

If you use MPI-IO approaches to reading/writing data - this includes parallel HDF5 and parallel NetCDF - then you very unlikely to see any performance improvements from using the solid state storage over the standard parallel Lustre file systems on ARCHER2.

Warning

There are no backups of any data on the solid state (NVMe) file system. You should not rely on this file system for long term storage.

"},{"location":"user-guide/data/#access-to-the-solid-state-file-system","title":"Access to the solid state file system","text":"

Projects do not have access to the solid state file system by default. If your project does not yet have access and you want access for your project, please contact the Service Desk to request access.

"},{"location":"user-guide/data/#location-of-directories","title":"Location of directories","text":"

You can find your directory on the file system at:

/mnt/lustre/a2fs-nvme/work/<project code>/<project code>/<username>\n

For example, if my username is auser and I am in project t01, I could find my solid state storage directory at:

/mnt/lustre/a2fs-nvme/work/t01/t01/auser\n
"},{"location":"user-guide/data/#quotas-on-solid-state-file-system","title":"Quotas on solid state file system","text":"

Important

All projects have the same, large quota of 250,000 GiB on the solid state file system to allow them to use it as a scratch file system. Remember, any files that have not been accessed in the last 28 days will be automatically deleted.

You query quotas for the solid state file system in the same way as quotas on the work file systems.

Bug

Usage and quotas of the solid state file system are not yet available in SAFE - you should use commands such as lfs quota -hp $(id -g) . to query quotas on the solid state file system.

"},{"location":"user-guide/data/#identifying-files-that-are-candidates-for-deletion","title":"Identifying files that are candidates for deletion","text":"

You can identify which files you own that are candidates for deletion at the next scratch file system purge using the find command in the following format:

find /mnt/lustre/a2fs-nvme/work/<project code> -atime +28 -type f -print\n

For example, if my account is in project t01, I would use:

find /mnt/lustre/a2fs-nvme/work/t01 -atime +28 -type f -print\n
"},{"location":"user-guide/data/#rdfaas-file-systems","title":"RDFaaS file systems","text":"

The RDFaaS file systems provide additional capacity for projects to store data that is not currently required on the compute nodes but which is too large for the Home file systems.

Warning

The RDFaaS file systems are backed up for disaster recovery purposes only (e.g. loss of the whole file system) so it is not possible to recover individual files if they are deleted by mistake or otherwise lost.

Tip

Not all projects on ARCHER2 have access to RDFaaS, if you do have access, this will show up in the login account page on SAFE for your ARCHER2 login account.

If you have access to RDFaaS, you will have a directory in one of two file systems: either /epsrc or /general.

For example, if your username is auser and you are in the e05 project, then your RDFaaS directory will be at:

/epsrc/e05/e05/auser\n

The RDFaaS file systems are not available on the ARCHER2 compute nodes.

Tip

If you are having issues accessing data on the RDFaaS file system then please contact the ARCHER2 Service Desk

"},{"location":"user-guide/data/#copying-data-from-rdfaas-to-work-file-systems","title":"Copying data from RDFaaS to Work file systems","text":"

You should use the standard Linux cp command to copy data from the RDFaaS file system to other ARCHER2 file systems (usually /work). For example, to transfer the file important-data.tar.gz from the RDFaaS file system to /work you would use the following command (assuming you are user auser in project e05):

cp /epsrc/e05/e05/auser/important-data.tar.gz /work/e05/e05/auser/\n

(remember to replace the project code and username with your own username and project code. You may also need to use /general if your data was there on the RDF file systems).

"},{"location":"user-guide/data/#subprojects","title":"Subprojects","text":"

Some large projects may choose to split their resources into multiple subprojects. These subprojects will have identifiers appended to the main project ID. For example, the rse subgroup of the z19 project would have the ID z19-rse. If the main project has allocated storage quotas to the subproject the directories for this storage will be found at, for example:

/home/z19/z19-rse/auser\n

Your Linux home directory will generally not be changed when you are made a member of a subproject so you must change directories manually (or change the ownership of files) to make use of this different storage quota allocation.

"},{"location":"user-guide/data/#sharing-data-with-other-archer2-users","title":"Sharing data with other ARCHER2 users","text":"

How you share data with other ARCHER2 users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.

"},{"location":"user-guide/data/#sharing-data-with-archer2-users-in-your-project","title":"Sharing data with ARCHER2 users in your project","text":"

Each project has an inner shared folder.

/work/[project code]/[project code]/shared\n

This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder would be located at /work/x01/x01/shared.

"},{"location":"user-guide/data/#sharing-data-with-archer2-users-within-the-same-project-group","title":"Sharing data with ARCHER2 users within the same project group","text":"

Some projects have subprojects (also often referred to as a 'project groups' or sub-budgets) e.g. project e123 might have a project group e123-fred for a sub-group of researchers working with Fred.

Often project groups do not have a disk quota set, but if the project PI does set up a group disk quota e.g. for /work then additional directories are created:

/work/e123/e123-fred\n/work/e123/e123-fred/shared\n/work/e123/e123-fred/<user> (for every user in the group)\n

and all members of the /work/e123/e123-fred group will be able to use the /work/e123/e123-fred/shared directory to share their files.

Note

If files are copied from their usual directories they will keep the original ownership. To grant ownership to the group:

chown -R $USER:e123-fred /work/e123/e123-fred/ ...

"},{"location":"user-guide/data/#sharing-data-with-all-archer2-users","title":"Sharing data with all ARCHER2 users","text":"

Each project also has an outer shared folder.:

/work/[project code]/shared\n

It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other ARCHER2 users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder would be located at /work/x01/shared.

"},{"location":"user-guide/data/#permissions","title":"Permissions","text":"

You should check the permissions of any files that you place in the shared area, especially if those files were created in your own ARCHER2 account. Files of the latter type are likely to be readable by you only.

The chmod command below shows how to make sure that a file placed in the outer shared folder is also readable by all ARCHER2 users.

chmod a+r /work/x01/shared/your-shared-file.txt\n

Similarly, for the inner shared folder, chmod can be called such that read permission is granted to all users within the x01 project.

chmod g+r /work/x01/x01/shared/your-shared-file.txt\n

If you're sharing a set of files stored within a folder hierarchy the chmod is slightly more complicated.

chmod -R a+Xr /work/x01/shared/my-shared-folder\nchmod -R g+Xr /work/x01/x01/shared/my-shared-folder\n

The -R option ensures that the read permission is enabled recursively and the +X guarantees that the user(s) you're sharing the folder with can access the subdirectories below my-shared-folder.

"},{"location":"user-guide/data/#sharing-data-between-projects-and-subprojects","title":"Sharing data between projects and subprojects","text":"

Every file has an owner group that specifies access permissions for users belonging to that group. It's usually the case that the group id is synonymous with the project code. Somewhat confusingly however, projects can contain groups of their own, called subprojects, which can be assigned disk space quotas distinct from the project.

chown -R $USER:x01-subproject /work/x01/x01-subproject/$USER/my-folder\n

The chown command above changes the owning group for all the files within my-folder to the x01-subproject group. This might be necessary if previously those files were owned by the x01 group and thereby using some of the x01 disk quota.

"},{"location":"user-guide/data/#archiving-and-data-transfer","title":"Archiving and data transfer","text":"

Data transfer speed may be limited by many different factors so the best data transfer mechanism to use depends on the type of data being transferred and where the data is going.

The method you use to transfer data to/from ARCHER2 will depend on how much you want to transfer and where to. The methods we cover in this guide are:

Before discussing specific data transfer methods, we cover archiving which is an essential process for transferring data efficiently.

"},{"location":"user-guide/data/#archiving","title":"Archiving","text":"

If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like tar and zip.

"},{"location":"user-guide/data/#tar","title":"tar","text":"

The tar command packs files into a \"tape archive\" format. The command has general form:

tar [options] [file(s)]\n

Common options include:

Putting these together:

tar -cvWlf mydata.tar mydata\n

will create and verify an archive.

To extract files from a tar file, the option -x is used. For example:

tar -b 2048 -xf mydata.tar\n

will recover the contents of mydata.tar to the current working directory (using a block size of 1 MiB to improve Lustre performance and reduce contention).

To verify an existing tar file against a set of data, the -d (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:

$> tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n

Note

tar files do not store checksums with their data, requiring the original data to be present during verification.

Tip

Further information on using tar can be found in the tar manual (accessed via man tar or at man tar).

"},{"location":"user-guide/data/#zip","title":"zip","text":"

The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:

zip [options] mydata.zip [file(s)]\n

Common options are:

Together:

zip -0r mydata.zip mydata\n

will create an archive.

Note

Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.

The corresponding unzip command is used to extract data from the archive. The simplest use case is:

unzip mydata.zip\n

which recovers the contents of the archive to the current working directory.

Files in a zip archive are stored with a CRC checksum to help detect data loss. unzip provides options for verifying this checksum against the stored files. The relevant flag is -t and is used as follows:

$> unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n

Tip

Further information on using zip can be found in the zip manual (accessed via man zip or at man zip).

"},{"location":"user-guide/data/#data-transfer-via-ssh","title":"Data transfer via SSH","text":"

The easiest way of transferring data to/from ARCHER2 is to use one of the standard programs based on the SSH protocol such as scp, sftp or rsync. These all use the same underlying mechanism (SSH) as you normally use to log-in to ARCHER2. So, once the the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine (ARCHER2 in this case).

To avoid having to type in your password multiple times you can set up a SSH key pair and use an SSH agent as documented in the User Guide at connecting.

"},{"location":"user-guide/data/#ssh-data-transfer-performance-considerations","title":"SSH data transfer performance considerations","text":"

The SSH protocol encrypts all traffic it sends. This means that file transfer using SSH consumes a relatively large amount of CPU time at both ends of the transfer (for encryption and decryption). The ARCHER2 login nodes have fairly fast processors that can sustain about 100 MB/s transfer. The encryption algorithm used is negotiated between the SSH client and the SSH server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The aes128-ctr or aes256-ctr algorithms are well supported and fast as they are implemented in hardware. These are not usually the default choice when using scp so you will need to manually specify them.

A single SSH based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.

In addition, you should consider the following when transferring data:

"},{"location":"user-guide/data/#scp","title":"scp","text":"

The scp command creates a copy of a file, or if given the -r flag, a directory either from a local machine onto a remote machine or from a remote machine onto a local machine.

For example, to transfer files to ARCHER2 from a local machine:

scp [options] source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

In the above example, the [destination] is optional, as when left out scp will copy the source into your home directory. Also, the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

If you want to request a different encryption algorithm add the -c [algorithm-name] flag to the scp options. For example, to use the (usually faster) aes128-ctr encryption algorithm you would use:

scp [options] -c aes128-ctr source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

"},{"location":"user-guide/data/#rsync","title":"rsync","text":"

The rsync command can also transfer data between hosts using a ssh connection. It creates a copy of a file or, if given the -r flag, a directory at the given destination, similar to scp above.

Given the -a option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the rsync command is executed with ssh to create the copy on a remote machine.

To transfer files to ARCHER2 using rsync with ssh the command has the form:

rsync [options] -e ssh source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

In the above example, the [destination] is optional, as when left out rsync will copy the source into your home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Additional flags can be specified for the underlying ssh command by using a quoted string as the argument of the -e flag. e.g.

rsync [options] -e \"ssh -c aes128-ctr\" source user@login.archer2.ac.uk:[destination]\n

(Remember to replace user with your ARCHER2 username in the example above.)

Tip

Further information on using rsync can be found in the rsync manual (accessed via man rsync or at man rsync).

"},{"location":"user-guide/data/#data-transfer-via-globus","title":"Data transfer via Globus","text":"

The ARCHER2 filesystems have a Globus Collection (formerly known as an endpoint) with the name \"Archer2 file systems\" Full step-by-step guide for using Globus to transfer files to/from ARCHER2

"},{"location":"user-guide/data/#data-transfer-via-gridftp","title":"Data transfer via GridFTP","text":"

ARCHER2 provides a module for grid computing, gct/6.2, otherwise known as the Globus Grid Community Toolkit v6.2.20201212. This toolkit provides a command line interface for moving data to and from GridFTP servers.

Data transfers are managed by the globus-url-copy command. Full details concerning this command's use can be found in the GCT 6.2 GridFTP User's Guide.

Info

Further information on using GridFTP on ARCHER2 to transfer data to the JASMIN facility can be found in the JASMIN user documentation.

"},{"location":"user-guide/data/#data-transfer-using-rclone","title":"Data transfer using rclone","text":"

Rclone is a command-line program to manage files on cloud storage. You can transfer files directly to/from cloud storage services, such as MS OneDrive and Dropbox. The program preserves timestamps and verifies checksums at all times.

First of all, you must download and unzip rclone on ARCHER2:

wget https://downloads.rclone.org/v1.62.2/rclone-v1.62.2-linux-amd64.zip\nunzip rclone-v1.62.2-linux-amd64.zip\ncd rclone-v1.62.2-linux-amd64/\n

The previous code snippet uses rclone v1.62.2, which was the latest version when these instructions were written.

Configure rclone using ./rclone config. This will guide you through an interactive setup process where you can make a new remote (called remote). See the following for detailed instructions for:

Please note that a token is required to connect from ARCHER2 to the cloud service. You need a web browser to get the token. The recommendation is to run rclone in your laptop using rclone authorize, get the token, and then copy the token from your laptop to ARCHER2. The rclone website contains further instructions on configuring rclone on a remote machine without web browser.

Once all the above is done, you're ready to go. If you want to copy a directory, please use:

rclone copy <archer2_directory> remote:<cloud_directory>

Please note that \"remote\" is the name that you have chosen when running rclone config. To copy files, please use:

rclone copyto <archer2_file> remote:<cloud_file>

Note

If the session times out while the data transfer takes place, adding the -vv flag to an rclone transfer forces rclone to output to the terminal and therefore avoids triggering the timeout process.

"},{"location":"user-guide/data/#ssh-data-transfer-example-laptopworkstation-to-archer2","title":"SSH data transfer example: laptop/workstation to ARCHER2","text":"

Here we have a short example demonstrating transfer of data directly from a laptop/workstation to ARCHER2.

Note

This guide assumes you are using a command line interface to transfer data. This means the terminal on Linux or macOS, MobaXterm local terminal on Windows or Powershell.

Before we can transfer of data to ARCHER2 we need to make sure we have an SSH key setup to access ARCHER2 from the system we are transferring data from. If you are using the same system that you use to log into ARCHER2 then you should be all set. If you want to use a different system you will need to generate a new SSH key there (or use SSH key forwarding) to allow you to connect to ARCHER2.

Tip

Remember that you will need to use both a key and your password to transfer data to ARCHER2.

Once we know our keys are setup correctly, we are now ready to transfer data directly between the two machines. We begin by combining our important research data in to a single archive file using the following command:

tar -czf all_my_files.tar.gz file1.txt file2.txt file3.txt\n

We then initiate the data transfer from our system to ARCHER2, here using rsync to allow the transfer to be recommenced without needing to start again, in the event of a loss of connection or other failure. For example, using the SSH key in the file ~/.ssh/id_RSA_A2 on our local system:

rsync -Pv -e\"ssh -c aes128-ctr -i $HOME/.ssh/id_RSA_A2\" ./all_my_files.tar.gz otbz19@login.archer2.ac.uk:/work/z19/z19/otbz19/\n

Note the use of the -P flag to allow partial transfer -- the same command could be used to restart the transfer after a loss of connection. The -e flag allows specification of the ssh command - we have used this to add the location of the identity file. The -c option specifies the cipher to be used as aes128-ctr which has been found to increase performance Unfortunately the ~ shortcut is not correctly expanded, so we have specified the full path. We move our research archive to our project work directory on ARCHER2.

Note

Remember to replace otbz19 with your username on ARCHER2.

If we were unconcerned about being able to restart an interrupted transfer, we could instead use the scp command,

scp -c aes128-ctr -i ~/.ssh/id_RSA_A2 all_my_files.tar.gz otbz19@login.archer2.ac.uk:/work/z19/z19/otbz19/\n

but rsync is recommended for larger transfers.

"},{"location":"user-guide/debug/","title":"Debugging","text":"

The following debugging tools are available on ARCHER2:

"},{"location":"user-guide/debug/#linaro-forge","title":"Linaro Forge","text":"

The Linaro Forge tool provides the DDT parallel debugger. See:

"},{"location":"user-guide/debug/#gdb4hpc","title":"gdb4hpc","text":"

The GNU Debugger for HPC (gdb4hpc) is a GDB-based debugger used to debug applications compiled with CCE, PGI, GNU, and Intel Fortran, C and C++ compilers. It allows programmers to either launch an application within it or to attach to an already-running application. Attaching to an already-running and hanging application is a quick way of understanding why the application is hanging, whereas launching an application through gdb4hpc will allow you to see your application running step-by-step, output the values of variables, and check whether the application runs as expected.

Tip

For your executable to be compatible with gdb4hpc, it will need to be coded with MPI. You will also need to compile your code with the debugging flag -g (e.g. cc -g my_program.c -o my_exe).

"},{"location":"user-guide/debug/#launching-through-gdb4hpc","title":"Launching through gdb4hpc","text":"

Launch gdb4hpc:

module load gdb4hpc\ngdb4hpc\n

You will get some information about this version of the program and, eventually, you will get a command prompt:

gdb4hpc 4.5 - Cray Line Mode Parallel Debugger\nWith Cray Comparative Debugging Technology.\nCopyright 2007-2019 Cray Inc. All Rights Reserved.\nCopyright 1996-2016 University of Queensland. All Rights Reserved.\nType \"help\" for a list of commands.\nType \"help <cmd>\" for detailed help about a command.\ndbg all>\n

We will use launch to begin a multi-process application within gdb4hpc. Consider that we are wanting to test an application called my_exe, and that we want this to be launched across all 256 processes in two nodes. We would launch this in gdb4hpc by running:

dbg all> launch --launcher-args=\"--account=[budget code] --partition=standard --qos=standard --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --exclusive --export=ALL\" $my_prog{256} ./my_ex\n

Make sure to replace the --account input to your budget code (e.g. if you are using budget t01, that part should look like --account=t01).

The default launcher is srun and the --launcher-args=\"...\" allows you to set launcher flags for srun. The variable $my_prog is a dummy name for the program being launched and you could use whatever name you want for it -- this will be the name of the srun job that will be run. The number in the brackets {256} is the number of processes over which the program will be executed, it's 256 here, but you could use any number. You should try to run this on as few processors as possible -- the more you use, the longer it will take for gdb4hpc to load the program.

Once the program is launched, gdb4hpc will load up the program and begin to run it. You will get output to screen something that looks like:

Starting application, please wait...\nCreating MRNet communication network...\nWaiting for debug servers to attach to MRNet communications network...\nTimeout in 400 seconds. Please wait for the attach to complete.\nNumber of dbgsrvs connected: [0];  Timeout Counter: [1]\nNumber of dbgsrvs connected: [0];  Timeout Counter: [2]\nNumber of dbgsrvs connected: [0];  Timeout Counter: [3]\nNumber of dbgsrvs connected: [1];  Timeout Counter: [0]\nNumber of dbgsrvs connected: [1];  Timeout Counter: [1]\nNumber of dbgsrvs connected: [2];  Timeout Counter: [0]\nFinalizing setup...\nLaunch complete.\nmy_prog{0..255}: Initial breakpoint, main at /PATH/TO/my_program.c:34\n

The line number at which the initial breakpoint is made (in the above example, line 34) corresponds to the line number at which MPI is initialised. You will not be able to see any parts of the code outside of the MPI region of a code with gdb4hpc.

Once the code is loaded, you can use various commands to move through your code. The following lists and describes some of the most useful ones:

Remember to exit the interactive session once you are done debugging.

"},{"location":"user-guide/debug/#attaching-with-gdb4hpc","title":"Attaching with gdb4hpc","text":"

Attaching to a hanging job using gdb4hpc is a great way of seeing which state each processor is in. However, this does not produce the most visually appealing results. For a more easy-to-read program, please take a look at the STAT tool.

In your interactive session, launch your executable as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n 256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00 --export=ALL \\\n            --account=[budget code] --partition=standard --qos=standard ./my_exe &\n

Make sure to replace the --account input to your budget code (e.g. if you are using budget t01, that part should look like --account=t01).

You will need to get the full job ID of the job you have just launched. To do this, run:

squeue -u $USER\n

and find the job ID associated with this interactive session -- this will be the one with the jobname bash. In this example:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n1050     workq my_mpi_j   jsindt  R       0:16      1 nid000001\n1051     workq     bash   jsindt  R       0:12      1 nid000002\n

the appropriate job id is 1051. Next, you will need to run sstat on this job id:

sstat 1051\n

This will output a large amount of information about this specific job. We are looking for the first number of this output, which should look like JOB_ID.## -- the number after the job ID is the number of slurm tasks performed in this interactive session. For our example (where srun is the first slurm task performed), the number is 1051.0.

Launch gdb4hpc:

module load gdb4hpc\ngdb4hpc\n

You will get some information about this version of the program and, eventually, you will get a command prompt:

gdb4hpc 4.5 - Cray Line Mode Parallel Debugger\nWith Cray Comparative Debugging Technology.\nCopyright 2007-2019 Cray Inc. All Rights Reserved.\nCopyright 1996-2016 University of Queensland. All Rights Reserved.\nType \"help\" for a list of commands.\nType \"help <cmd>\" for detailed help about a command.\ndbg all>\n

We will be using the attach command to attach to our program that hangs. This is done by writing:

dbg all> attach $my_prog JOB_ID.##\n

where JOB_ID.## is the full job ID found using sstat (in our example, this would be 1051.0). The name $my_prog is a dummy-name -- it could be whatever name you like.

As it is attaching, gdb4hpc will output text to screen that looks like:

Attaching to application, please wait...\nCreating MRNet communication network...\nWaiting for debug servers to attach to MRNet communications network...\nTimeout in 400 seconds. Please wait for the attach to complete.\nNumber of dbgsrvs connected: [0];  Timeout Counter: [1]\n\n...\n\nFinalizing setup...\nAttach complete.\nCurrent rank location:\n

After this, you will get an output that, among other things, tells you which line of your code each process is on, and what each process is doing. This can be helpful to see where the hang-up is.

If you accidentally attached to the wrong job, you can detach by running:

dbg all> release $my_prog\n

and re-attach with the correct job ID. You will need to change your dummy name from $my_prog to something else.

When you are finished using gbd4hpc, simply run:

dbg all> quit\n

Do not forget to exit your interactive session.

"},{"location":"user-guide/debug/#valgrind4hpc","title":"valgrind4hpc","text":"

valgrind4hpc is a Valgrind-based debugging tool to aid in the detection of memory leaks and errors in parallel applications. Valgrind4hpc aggregates any duplicate messages across ranks to help provide an understandable picture of program behavior. Valgrind4hpc manages starting and redirecting output from many copies of Valgrind, as well as recombining and filtering Valgrind messages. If your program can be debugged with Valgrind, it can be debugged with valgrind4hpc.

The valgrind4hpc module enables the use of standard valgrind as well as the valgrind4hpc version more suitable to parallel programs.

"},{"location":"user-guide/debug/#using-valgrind-with-serial-programs","title":"Using Valgrind with serial programs","text":"

Launch valgrind4hpc:

module load valgrind4hpc\n

Next, run your executable through valgrind:

valgrind --tool=memcheck --leak-check=yes my_executable\n

The log outputs to screen. The ERROR SUMMARY will tell you whether, and how many, memory errors there are in your program. Furthermore, if you compile your code using the -g debugging flag (e.g. gcc -g my_program.c -o my_executable.c), the log will point out the code lines where the error occurs.

Valgrind also includes a tool called Massif that can be used to give insight into the memory usage of your program. It takes regular snapshots and outputs this data into a single file, which can be visualised to show the total amount of memory used as a function of time. This shows when peaks and bottlenecks occur and allows you to identify which data structures in your code are responsible for the largest memory usage of your program.

Documentation explaining how to use Massif is available at the official Massif manual. In short, you should run your executable as follows:

valgrind --tool=massif my_executable\n

The memory profiling data will be output into a file called massif.out.pid, where pid is the runtime process ID of your program. A custom filename can be chosen using the --massif-out-file option, as follows:

valgrind --tool=massif --massif-out-file=optional_filename.out my_executable\n

The output file contains raw profiling statistics. To view a summary including a graphical plot of memory usage over time, use the ms_print command as follows:

ms_print massif.out.12345\n

or, to save to a file:

ms_print massif.out.12345 > massif.analysis.12345\n

This will show total memory usage over time as well as a breakdown of the top data structures contributing to memory usage at each snapshot where there has been a significant allocation or deallocation of memory.

"},{"location":"user-guide/debug/#using-valgrind4hpc-with-parallel-programs","title":"Using Valgrind4hpc with parallel programs","text":"

First, load valgrind4hpc:

module load valgrind4hpc\n

To run valgrind4hpc, first reserve the resources you will use with salloc. The following reservation request is for 2 nodes (256 physical cores) for 20 minutes on the short queue:

auser@uan01:> salloc --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 \\\n              --time=00:20:00 --partition=standard --qos=short \\\n              --hint=nomultithread \\\n              --distribution=block:block --account=[budget code]\n

Once your allocation is ready, Use valgrind4hpc to run and profile your executable. To test an executable called my_executable that requires two arguments arg1 and arg2 on 2 nodes and 256 processes, run:

valgrind4hpc --tool=memcheck --num-ranks=256 my_executable -- arg1 arg2\n

In particular, note the -- separating the executable from the arguments (this is not necessary if your executable takes no arguments).

Valgrind4hpc only supports certain tools found in valgrind. These are: memcheck, helgrind, exp-sgcheck, or drd. The --valgrind-args=\"arguments\" allows users to use valgrind options not supported in valgrind4hpc (e.g. --leak-check) -- note, however, that some of these options might interfere with valgrind4hpc.

More information on valgrind4hpc can be found in the manual (man valgrind4hpc).

"},{"location":"user-guide/debug/#stat","title":"STAT","text":"

The Stack Trace Analysis Tool (STAT) is a cross-platform debugging tool from the University of Wisconsin-Madison. ATP is based on the same technology as STAT, both are designed to gather and merge stack traces from a running application's parallel processes. The STAT tool can be useful when application seems to be deadlocked or stuck, i.e. they don't crash but they don't progress as expected, and it has been designed to scale to a very large number of processes. Full information on STAT, including use cases, is available at the STAT website.

STAT will attach to a running program and query that program to find out where all the processes in that program currently are. It will then process that data and produce a graph displaying the unique process locations (i.e. where all the processes in the running program currently are). To make this easily understandable it collates together all processes that are in the same place providing only unique program locations for display.

"},{"location":"user-guide/debug/#using-stat-on-archer2","title":"Using STAT on ARCHER2","text":"

On the login node, load the cray-stat module:

module load cray-stat\n

Then, launch your job using srun as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n 256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00  --export=ALL\\\n            --account=[budget code] --partition=standard --qos=standard./my_exe &\n

Note

This example has set the job time limit to 1 hour -- if you need longer, change the --time command.

You will need the Program ID (PID) of the job you have just launched -- the PID is printed to screen upon launch, or you can get it by running:

ps -u $USER\n

This will present you with a set of text that looks like this:

PID TTY          TIME CMD\n154296 ?     00:00:00 systemd\n154297 ?     00:00:00 (sd-pam)\n154302 ?     00:00:00 sshd\n154303 pts/8 00:00:00 bash\n157150 pts/8 00:00:00 salloc\n157152 pts/8 00:00:00 bash\n157183 pts/8 00:00:00 srun\n157185 pts/8 00:00:00 srun\n157191 pts/8 00:00:00 ps\n

Once your application has reached the point where it hangs, issue the following command (replacing PID with the ID of the first srun task -- in the above example, I would replace PID with 157183):

stat-cl -i PID\n

You will get an output that looks like this:

STAT started at 2020-07-22-13:31:35\nAttaching to job launcher (null):157565 and launching tool daemons...\nTool daemons launched and connected!\nAttaching to application...\nAttached!\nApplication already paused... ignoring request to pause\nSampling traces...\nTraces sampled!\nResuming the application...\nResumed!\nPausing the application...\nPaused!\n\n...\n\nDetaching from application...\nDetached!\n\nResults written to $PATH_TO_RUN_DIRECTORY/stat_results/my_exe.0000\n

Once STAT is finished, you can kill the srun job using scancel (replacing JID with the job ID of the job you just launched):

scancel JID\n

You can view the results that STAT has produced using the following command (note that \"my_exe\" will need to be replaced with the name of the executable you ran):

stat-view stat_results/my_exe.0000/00_my_exe.0000.3D.dot\n

This produces a graph displaying all the different places within the program that the parallel processes were when you queried them.

Note

To see the graph, you will need to have exported your X display when logging in.

Larger jobs may spend significant time queueing, requiring submission as a batch job. In this case, a slightly different invocation is illustrated as follows:

#!/bin/bash --login\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load additional modules\nmodule load cray-stat\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# This environment variable is required\nexport CTI_SLURM_OVERRIDE_MC=1\n\n# Request that stat sleeps for 3600 seconds before attaching\n# to our executable which we launch with command introduced\n# with -C:\n\nstat-cl -s 3600 -C srun --unbuffered ./my_exe\n

If the job is hanging it will continue to run until the wall clock exceeds the requested time. Use the stat-view utility to inspect the results, as discussed above.

"},{"location":"user-guide/debug/#atp","title":"ATP","text":"

To enable ATP you should load the atp module and set the ATP_ENABLED environment variable to 1 on the login node:

module load atp\nexport ATP_ENABLED=1\n# Fix for a known issue:\nexport HOME=${HOME/home/work}\n

Then, launch your job using srun as a background task (by adding an & at the end of the command). For example, if you are running an executable called my_exe using 256 processes, you would run:

srun -n=256 --nodes=2 --ntasks-per-node=128 --cpus-per-task=1 --time=01:00:00 --export=ALL \\\n            --account=[budget code] --partition=standard --qos=standard ./my_exe &\n

Note

This example has set the job time limit to 1 hour -- if you need longer, change the --time command.

Once the job has finished running, load the stat module to view the results:

module load cray-stat\n

and view the merged stack trace using:

stat-view atpMergedBT.dot\n

Note

To see the graph, you will need to have exported your X display when logging in.

"},{"location":"user-guide/dev-environment-4cab/","title":"Application development environment: 4-cabinet system","text":"

Important

This section covers the application development environment on the initial, 4-cabinet ARCHER2 system. For docmentation on the application development environment on the full ARCHER2 system, please see Application development environment: full system.

"},{"location":"user-guide/dev-environment-4cab/#whats-available","title":"What's available","text":"

ARCHER2 runs on the Cray Linux Environment (a version of SUSE Linux), and provides a development environment which includes:

Access to particular software, and particular versions, is managed by a standard TCL module framework. Most software is available via standard software modules and the different programming environments are available via module collections.

You can see what programming environments are available with:

auser@uan01:~> module savelist\nNamed collection list:\n 1) PrgEnv-aocc   2) PrgEnv-cray   3) PrgEnv-gnu\n

Other software modules can be listed with

auser@uan01:~> module avail\n------------------------------- /opt/cray/pe/perftools/20.09.0/modulefiles --------------------------------\nperftools       perftools-lite-events  perftools-lite-hbm    perftools-nwpc     \nperftools-lite  perftools-lite-gpu     perftools-lite-loops  perftools-preload  \n\n---------------------------------- /opt/cray/pe/craype/2.7.0/modulefiles ----------------------------------\ncraype-hugepages1G  craype-hugepages8M   craype-hugepages128M  craype-network-ofi          \ncraype-hugepages2G  craype-hugepages16M  craype-hugepages256M  craype-network-slingshot10  \ncraype-hugepages2M  craype-hugepages32M  craype-hugepages512M  craype-x86-rome             \ncraype-hugepages4M  craype-hugepages64M  craype-network-none   \n\n------------------------------------- /usr/local/Modules/modulefiles --------------------------------------\ndot  module-git  module-info  modules  null  use.own  \n\n-------------------------------------- /opt/cray/pe/cpe-prgenv/7.0.0 --------------------------------------\ncpe-aocc  cpe-cray  cpe-gnu  \n\n-------------------------------------------- /opt/modulefiles ---------------------------------------------\naocc/2.1.0.3(default)  cray-R/4.0.2.0(default)  gcc/8.1.0  gcc/9.3.0  gcc/10.1.0(default)  \n\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\natp/3.7.4(default)              cray-mpich-abi/8.0.15             craype-dl-plugin-py3/20.06.1(default)  \ncce/10.0.3(default)             cray-mpich-ucx/8.0.15             craype/2.7.0(default)                  \ncray-ccdb/4.7.1(default)        cray-mpich/8.0.15(default)        craypkg-gen/1.3.10(default)            \ncray-cti/2.7.3(default)         cray-netcdf-hdf5parallel/4.7.4.0  gdb4hpc/4.7.3(default)                 \ncray-dsmml/0.1.2(default)       cray-netcdf/4.7.4.0               iobuf/2.0.10(default)                  \ncray-fftw/3.3.8.7(default)      cray-openshmemx/11.1.1(default)   papi/6.0.0.2(default)                  \ncray-ga/5.7.0.3                 cray-parallel-netcdf/1.12.1.0     perftools-base/20.09.0(default)        \ncray-hdf5-parallel/1.12.0.0     cray-pmi-lib/6.0.6(default)       valgrind4hpc/2.7.2(default)            \ncray-hdf5/1.12.0.0              cray-pmi/6.0.6(default)           \ncray-libsci/20.08.1.2(default)  cray-python/3.8.5.0(default)      \n

A full discussion of the module system is available in the Software environment section.

A consistent set of modules is loaded on login to the machine (currently PrgEnv-cray, see below). Developing applications then means selecting and loading the appropriate set of modules before starting work.

This section is aimed at code developers and will concentrate on the compilation environment and building libraries and executables, and specifically parallel executables. Other topics such as Python and Containers are covered in more detail in separate sections of the documentation.

"},{"location":"user-guide/dev-environment-4cab/#managing-development","title":"Managing development","text":"

ARCHER2 supports common revision control software such as git.

Standard GNU autoconf tools are available, along with make (which is GNU Make). Versions of cmake are available.

Note

Some of these tools are part of the system software, and typically reside in /usr/bin, while others are provided as part of the module system. Some tools may be available in different versions via both /usr/bin and via the module system.

"},{"location":"user-guide/dev-environment-4cab/#compilation-environment","title":"Compilation environment","text":"

There are three different compiler environments available on ARCHER2: AMD (AOCC), Cray (CCE), and GNU (GCC). The current compiler suite is selected via the programming environment, while the specific compiler versions are determined by the relevant compiler module. A summary is:

Suite name Module Programming environment collection CCE cce PrgEnv-cray GCC gcc PrgEnv-gnu AOCC aocc PrgEnv-aocc

For example, at login, the default set of modules are:

Currently Loaded Modulefiles:\n1) cpe-cray                          7) cray-dsmml/0.1.2(default)                           \n2) cce/10.0.3(default)               8) perftools-base/20.09.0(default)                     \n3) craype/2.7.0(default)             9) xpmem/2.2.35-7.0.1.0_1.3__gd50fabf.shasta(default)  \n4) craype-x86-rome                  10) cray-mpich/8.0.15(default)                          \n5) libfabric/1.11.0.0.233(default)  11) cray-libsci/20.08.1.2(default)                      \n6) craype-network-ofi  \n

from which we see the default programming environment is Cray (indicated by cpe-cray (at 1 in the list above) and the default compiler module is cce/10.0.3 (at 2 in the list above). The programming environment will give access to a consistent set of compiler, MPI library via cray-mpich (at 10), and other libraries e.g., cray-libsci (at 11 in the list above) infrastructure.

Within a given programming environment, it is possible to swap to a different compiler version by swapping the relevant compiler module.

To ensure consistent behaviour, compilation of C, C++, and Fortran source code should then take place using the appropriate compiler wrapper: cc, CC, and ftn, respectively. The wrapper will automatically call the relevant underlying compiler and add the appropriate include directories and library locations to the invocation. This typically eliminates the need to specify this additional information explicitly in the configuration stage. To see the details of the exact compiler invocation use the -craype-verbose flag to the compiler wrapper.

The default link time behaviour is also related to the current programming environment. See the section below on Linking and libraries.

Users should not, in general, invoke specific compilers at compile/link stages. In particular, gcc, which may default to /usr/bin/gcc, should not be used. The compiler wrappers cc, CC, and ftn should be used via the appropriate module. Other common MPI compiler wrappers e.g., mpicc should also be replaced by the relevant wrapper cc (mpicc etc are not available).

Important

Always use the compiler wrappers cc, CC, and/or ftn and not a specific compiler invocation. This will ensure consistent compile/link time behaviour.

"},{"location":"user-guide/dev-environment-4cab/#compiler-man-pages-and-help","title":"Compiler man pages and help","text":"

Further information on both the compiler wrappers, and the individual compilers themselves are available via the command line, and via standard man pages. The man page for the compiler wrappers is common to all programming environments, while the man page for individual compilers depends on the currently loaded programming environment. The following table summarises options for obtaining information on the compiler and compile options:

Compiler suite C C++ Fortran Cray man craycc man crayCC man crayftn GNU man gcc man g++ man gfortran Wrappers man cc man CC man ftn

Tip

You can also pass the --help option to any of the compilers or wrappers to get a summary of how to use them. The Cray Fortran compiler uses ftn --craype-help to access the help options.

Tip

There are no man pages for the AOCC compilers at the moment.

Tip

Cray C/C++ is based on Clang and therefore supports similar options to clang/gcc (man clang is in fact equivalent to man craycc). clang --help will produce a full summary of options with Cray-specific options marked \"Cray\". The craycc man page concentrates on these Cray extensions to the clang front end and does not provide an exhaustive description of all clang options. Cray Fortran is not based on Flang and so takes different options from flang/gfortran.

"},{"location":"user-guide/dev-environment-4cab/#dynamic-linking","title":"Dynamic Linking","text":"

Executables on ARCHER2 link dynamically, and the Cray Programming Environment does not currently support static linking. This is in contrast to ARCHER where the default was to build statically.

If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

The compiler wrapper scripts on ARCHER link runtime libraries in using the runpath by default. This means that the paths to the runtime libraries are encoded into the executable so you do not need to load the compiler environment in your job submission scripts.

"},{"location":"user-guide/dev-environment-4cab/#which-compiler-environment","title":"Which compiler environment?","text":"

If you are unsure which compiler you should choose, we suggest the starting point should be the GNU compiler collection (GCC, PrgEnv-gnu); this is perhaps the most commonly used by code developers, particularly in the open source software domain. A portable, standard-conforming code should (in principle) compile in any of the three programming environments.

For users requiring specific compiler features, such as co-array Fortran, the recommended starting point would be Cray. The following sections provide further details of the different programming environments.

Warning

Intel compilers are not available on ARCHER2.

"},{"location":"user-guide/dev-environment-4cab/#amd-optimizing-cc-compiler-aocc","title":"AMD Optimizing C/C++ Compiler (AOCC)","text":"

The AMD Optimizing C/++ Compiler (AOCC) is a clang-based optimising compiler. AOCC (despite its name) includes a flang-based Fortran compiler.

Switch the the AOCC programming environment via

$ module restore PrgEnv-aocc\n

Note

Further details on AOCC will appear here as they become available.

"},{"location":"user-guide/dev-environment-4cab/#aocc-reference-material","title":"AOCC reference material","text":""},{"location":"user-guide/dev-environment-4cab/#cray-compiler-environment-cce","title":"Cray compiler environment (CCE)","text":"

The Cray compiler environment (CCE) is the default compiler at the point of login. CCE supports C/C++ (along with unified parallel C UPC), and Fortran (including co-array Fortran). Support for OpenMP parallelism is available for both C/C++ and Fortran (currently OpenMP 4.5, with a number of exceptions).

The Cray C/C++ compiler is based on a clang front end, and so compiler options are similar to those for gcc/clang. However, the Fortran compiler remains based around Cray-specific options. Be sure to separate C/C++ compiler options and Fortran compiler options (typically CFLAGS and FFLAGS) if compiling mixed C/Fortran applications.

Switch the the Cray programming environment via

$ module restore PrgEnv-cray\n
"},{"location":"user-guide/dev-environment-4cab/#useful-cce-cc-options","title":"Useful CCE C/C++ options","text":"

When using the compiler wrappers cc or CC, some of the following options may be useful:

Language, warning, Debugging options:

Option Comment -std=<standard> Default is -std=gnu11 (gnu++14 for C++) [1]

Performance options:

Option Comment -Ofast Optimisation levels: -O0, -O1, -O2, -O3, -Ofast -ffp=level Floating point maths optimisations levels 0-4 [2] -flto Link time optimisation

Miscellaneous options:

Option Comment -fopenmp Compile OpenMP (default is off) -v Display verbose output from compiler stages

Notes

  1. Option -std=gnu11 gives c11 plus GNU extensions (likewise c++14 plus GNU extensions). See https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/C-Extensions.html
  2. Option -ffp=3 is implied by -Ofast or -ffast-math
"},{"location":"user-guide/dev-environment-4cab/#useful-cce-fortran-options","title":"Useful CCE Fortran options","text":"

Language, Warning, Debugging options:

Option Comment -m <level> Message level (default -m 3 errors and warnings)

Performance options:

Option Comment -O <level> Optimisation levels: -O0 to -O3 (default -O2) -h fp<level> Floating point maths optimisations levels 0-3 -h ipa Inter-procedural analysis

Miscellaneous options:

Option Comment -h omp Compile OpenMP (default is -hnoomp) -v Display verbose output from compiler stages"},{"location":"user-guide/dev-environment-4cab/#gnu-compiler-collection-gcc","title":"GNU compiler collection (GCC)","text":"

The commonly used open source GNU compiler collection is available and provides C/C++ and Fortran compilers.

The GNU compiler collection is loaded by switching to the GNU programming environment:

$ module restore PrgEnv-gnu\n

Bug

The gcc/8.1.0 module is available on ARCHER2 but cannot be used as the supporting scientific and system libraries are not available. You should not use this version of GCC.

Warning

If you want to use GCC version 10 or greater to compile Fortran code, with the old MPI interfaces (i.e. use mpi or INCLUDE 'mpif.h') you must add the -fallow-argument-mismatch option (or equivalent) when compiling otherwise you will see compile errors associated with MPI functions. The reason for this is that past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines using the old MPI interfaces where arrays of different types are passed to, for example, MPI_Send(). This will now generate an error as not standard conforming. The -fallow-argument-mismatch option is used to reduce the error to a warning. The same effect may be achieved via -std=legacy.

If you use the Fortran 2008 MPI interface (i.e. use mpi_f08) then you should not need to add this option.

Fortran language MPI bindings are described in more detail at in the MPI Standard documentation.

"},{"location":"user-guide/dev-environment-4cab/#useful-gnu-fortran-options","title":"Useful Gnu Fortran options","text":"Option Comment -std=<standard> Default is gnu -fallow-argument-mismatch Allow mismatched procedure arguments. This argument is required for compiling MPI Fortran code with GCC version 10 or greater if you are using the older MPI interfaces (see warning above) -fbounds-check Use runtime checking of array indices -fopenmp Compile OpenMP (default is no OpenMP) -v Display verbose output from compiler stages

Tip

The standard in -std may be one of f95 f2003, f2008 or f2018. The default option -std=gnu is the latest Fortran standard plus gnu extensions.

Warning

Past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines where arrays of different types are passed to MPI_Send() and so on. This will now generate an error as not standard conforming. Use -fallow-argument-mismatch to reduce the error to a warning. The same effect may be achieved via -std=legacy.

"},{"location":"user-guide/dev-environment-4cab/#reference-material","title":"Reference material","text":""},{"location":"user-guide/dev-environment-4cab/#message-passing-interface-mpi","title":"Message passing interface (MPI)","text":""},{"location":"user-guide/dev-environment-4cab/#hpe-cray-mpich","title":"HPE Cray MPICH","text":"

HPE Cray provide, as standard, an MPICH implementation of the message passing interface which is specifically optimised for the ARCHER2 network. The current implementation supports MPI standard version 3.1.

The HPE Cray MPICH implementation is linked into software by default when compiling using the standard wrapper scripts: cc, CC and ftn.

"},{"location":"user-guide/dev-environment-4cab/#mpi-reference-material","title":"MPI reference material","text":"

MPI standard documents: https://www.mpi-forum.org/docs/

"},{"location":"user-guide/dev-environment-4cab/#linking-and-libraries","title":"Linking and libraries","text":"

Linking to libraries is performed dynamically on ARCHER2. One can use the -craype-verbose flag to the compiler wrapper to check exactly what linker arguments are invoked. The compiler wrapper scripts encode the paths to the programming environment system libraries using RUNPATH. This ensures that the executable can find the correct runtime libraries without the matching software modules loaded.

The library RUNPATH associated with an executable can be inspected via, e.g.,

$ readelf -d ./a.out\n

(swap a.out for the name of the executable you are querying).

"},{"location":"user-guide/dev-environment-4cab/#commonly-used-libraries","title":"Commonly used libraries","text":"

Modules with names prefixed by cray- are provided by HPE Cray, and are supported to be consistent with any of the programming environments and associated compilers. These modules should be the first choice for access to software libraries if available.

Tip

More information on the different software libraries on ARCHER2 can be found in the Software libraries section of the user guide.

"},{"location":"user-guide/dev-environment-4cab/#switching-to-a-different-hpe-cray-programming-environment-release","title":"Switching to a different HPE Cray Programming Environment release","text":"

Important

See the section below on using non-default versions of HPE Cray libraries below as this process will generally need to be followed when using software from non-default PE installs.

Access to non-default PE environments is controlled by the use of the cpe modules. These modules are typically loaded after you have restored a PrgEnv and loaded all the other modules you need and will set your compile environment to match that in the other PE release. This means:

For example, if you have a code that uses the Gnu programming environment, FFTW and NetCDF parallel libraries and you want to compile in the (non-default) 21.03 programming environment, you would do the following:

First, restore the Gnu programming environment and load the required library modules (FFTW and NetCDF HDF5 parallel). The loaded module list shows they are the versions from the default (20.10) programming environment):

auser@uan02:/work/t01/t01/auser> module restore -s PrgEnv-gnu\nauser@uan02:/work/t01/t01/auser> module load cray-fftw\nauser@uan02:/work/t01/t01/auser> module load cray-netcdf\nauser@uan02:/work/t01/t01/auser> module load cray-netcdf-hdf5parallel\nauser@uan02:/work/t01/t01/auser> module list\nCurrently Loaded Modulefiles:\n 1) cpe-gnu                           9) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)               \n 2) gcc/10.1.0(default)              10) cray-mpich/8.0.16(default)                                       \n 3) craype/2.7.2(default)            11) cray-libsci/20.10.1.2(default)                                   \n 4) craype-x86-rome                  12) bolt/0.7                                                         \n 5) libfabric/1.11.0.0.233(default)  13) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env  \n 6) craype-network-ofi               14) /usr/local/share/epcc-module/epcc-module-loader                  \n 7) cray-dsmml/0.1.2(default)        15) cray-fftw/3.3.8.8(default)                                       \n 8) perftools-base/20.10.0(default)  16) cray-netcdf-hdf5parallel/4.7.4.2(default) \n

Now, load the cpe/21.03 programming environment module to switch all the currently loaded HPE Cray modules from the default (20.10) programming environment version to the 21.03 programming environment versions:

auser@uan02:/work/t01/t01/auser> module load cpe/21.03\nSwitching to cray-dsmml/0.1.3.\nSwitching to cray-fftw/3.3.8.9.\nSwitching to cray-libsci/21.03.1.1.\nSwitching to cray-mpich/8.1.3.\nSwitching to cray-netcdf-hdf5parallel/4.7.4.3.\nSwitching to craype/2.7.5.\nSwitching to gcc/9.3.0.\nSwitching to perftools-base/21.02.0.\n\nLoading cpe/21.03\n  Unloading conflict: cray-dsmml/0.1.2 cray-fftw/3.3.8.8 cray-libsci/20.10.1.2 cray-mpich/8.0.16 cray-netcdf-hdf5parallel/4.7.4.2\n    craype/2.7.2 gcc/10.1.0 perftools-base/20.10.0\n  Loading requirement: cray-dsmml/0.1.3 cray-fftw/3.3.8.9 cray-libsci/21.03.1.1 cray-mpich/8.1.3 cray-netcdf-hdf5parallel/4.7.4.3\n    craype/2.7.5 gcc/9.3.0 perftools-base/21.02.0\nauser@uan02:/work/t01/t01/auser> module list\nCurrently Loaded Modulefiles:\n 1) cpe-gnu                                                           9) cray-dsmml/0.1.3                  17) cpe/21.03(default)  \n 2) craype-x86-rome                                                  10) cray-fftw/3.3.8.9                 \n 3) libfabric/1.11.0.0.233(default)                                  11) cray-libsci/21.03.1.1             \n 4) craype-network-ofi                                               12) cray-mpich/8.1.3                  \n 5) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)               13) cray-netcdf-hdf5parallel/4.7.4.3  \n 6) bolt/0.7                                                         14) craype/2.7.5                      \n 7) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env  15) gcc/9.3.0                         \n 8) /usr/local/share/epcc-module/epcc-module-loader                  16) perftools-base/21.02.0   \n

Finally (as noted above), you will need to modify the value of LD_LIBRARY_PATH before you compile your software to ensure it picks up the non-default versions of libraries:

auser@uan02:/work/t01/t01/auser> export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

Now you can go ahead and compile your software with the new programming environment.

Important

The cpe modules only change the versions of software modules provided as part of the HPE Cray programming environments. Any modules provided by the ARCHER2 service will need to be loaded manually after you have completed the process described above.

Note

Unloading the cpe module does not restore the original programming environment release. To restore the default programming environment release you should log out and then log back in to ARCHER2.

Bug

The cpe/21.03 module has a known issue with PrgEnv-gnu where it loads an old version of GCC (9.3.0) rather than the correct, newer version (10.2.0). You can resolve this by using the sequence:

module restore -s PrgEnv-gnu\n...load any other modules you need...\nmodule load cpe/21.03\nmodule unload cpe/21.03\nmodule swap gcc gcc/10.2.0\n

"},{"location":"user-guide/dev-environment-4cab/#available-hpe-cray-programming-environment-releases-on-archer2","title":"Available HPE Cray Programming Environment releases on ARCHER2","text":"

ARCHER2 currently has the following HPE Cray Programming Environment releases available:

Tip

You can see which programming environment release you currently have loaded by using module list and looking at the version number of the cray-libsci module you have loaded. The first two numbers indicate the version of the PE you have loaded. For example, if you have cray-libsci/20.10.1.2 loaded then you are using the 20.10 PE release.

"},{"location":"user-guide/dev-environment-4cab/#using-non-default-versions-of-hpe-cray-libraries-on-archer2","title":"Using non-default versions of HPE Cray libraries on ARCHER2","text":"

If you wish to make use of non-default versions of libraries provided by HPE Cray (usually because they are part of a non-default PE release: either old or new) then you need to make changes at both compile and runtime. In summary, you need to load the correct module and also make changes to the LD_LIBRARY_PATH environment variable.

At compile time you need to load the version of the library module before you compile and set the LD_LIBRARY_PATH environment variable to include the contencts of $CRAY_LD_LIBRARY_PATH as the first entry. For example, to use the, non-default, 20.08.1.2 version of HPE Cray LibSci in the default programming environment (Cray Compiler Environment, CCE) you would first setup the environment to compile with:

auser@uan01:~/test/libsci> module swap cray-libsci cray-libsci/20.08.1.2 \nauser@uan01:~/test/libsci> export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

The order is important here: every time you change a module, you will need to reset the value of LD_LIBRARY_PATH for the process to work (it will not be updated automatically).

Now you can compile your code. You can check that the executable is using the correct version of LibSci with the ldd command and look for the line beginning libsci_cray.so.5, you should see the version in the path to the library file:

auser@uan01:~/test/libsci> ldd dgemv.x \n    linux-vdso.so.1 (0x00007ffe4a7d2000)\n    libsci_cray.so.5 => /opt/cray/pe/libsci/20.08.1.2/CRAY/9.0/x86_64/lib/libsci_cray.so.5 (0x00007fafd6a43000)\n    libdl.so.2 => /lib64/libdl.so.2 (0x00007fafd683f000)\n    libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00007fafd663c000)\n    libquadmath.so.0 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libquadmath.so.0 (0x00007fafd63fc000)\n    libmodules.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libmodules.so.1 (0x00007fafd61e0000)\n    libfi.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libfi.so.1 (0x00007fafd5abe000)\n    libcraymath.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libcraymath.so.1 (0x00007fafd57e2000)\n    libf.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libf.so.1 (0x00007fafd554f000)\n    libu.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libu.so.1 (0x00007fafd523b000)\n    libcsup.so.1 => /opt/cray/pe/cce/10.0.4/cce/x86_64/lib/libcsup.so.1 (0x00007fafd5035000)\n    libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00007fafd4c62000)\n    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fafd4a43000)\n    libc.so.6 => /lib64/libc.so.6 (0x00007fafd4688000)\n    libm.so.6 => /lib64/libm.so.6 (0x00007fafd4350000)\n    /lib64/ld-linux-x86-64.so.2 (0x00007fafda988000)\n    librt.so.1 => /lib64/librt.so.1 (0x00007fafd4148000)\n    libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00007fafd3c92000)\n    libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00007fafd3a7a000)\n

Tip

If any of the libraries point to versions in the /opt/cray/pe/lib64 directory then these are using the default versions of the libraries rather than the specific versions. This happens at compile time if you have forgotton to load the right module and set $LD_LIBRARY_PATH afterwards.

At run time (typically in your job script) you need to repeat the environment setup steps (you can also use the ldd command in your job submission script to check the library is pointing to the correct version). For example, a job submission script to run our dgemv.x executable with the non-default version of LibSci could look like:

#!/bin/bash\n#SBATCH --job-name=dgemv\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n# Load the standard environment module\nmodule load epcc-job-env\n\n# Setup up the environment to use the non-default version of LibSci\n#   We use \"module swap\" as the \"cray-libsci\" is loaded by default.\n#   This must be done after loading the \"epcc-job-env\" module\nmodule swap cray-libsci cray-libsci/20.08.1.2\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n\n# Check which library versions the executable is pointing too\nldd dgemv.x\n\nexport OMP_NUM_THREADS=1\n\nsrun --hint=nomultithread --distribution=block:block dgemv.x\n

Tip

As when compiling, the order of commands matters. Setting the value of LD_LIBRARY_PATH must happen after you have finished all your module commands for it to have the correct effect.

Important

You must setup the environment at both compile and run time otherwise you will end up using the default version of the library.

"},{"location":"user-guide/dev-environment-4cab/#compiling-in-compute-nodes","title":"Compiling in compute nodes","text":"

Sometimes you may wish to compile in a batch job. For example, the compile process may take a long time or the compile process is part of the research workflow and can be coupled to the production job. Unlike login nodes, the /home file system is not available.

An example job submission script for a compile job using make (assuming the Makefile is in the same directory as the job submission script) would be:

#!/bin/bash\n\n#SBATCH --job-name=compile\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the compilation environment (cray, gnu or aocc)\nmodule restore /etc/cray-pe.d/PrgEnv-cray\n\nmake clean\n\nmake\n

Warning

Do not forget to include the full path when the compilation environment is restored. For instance:

module restore /etc/cray-pe.d/PrgEnv-cray

You can also use a compute node in an interactive way using salloc. Please see Section Using salloc to reserve resources for further details. Once your interactive session is ready, you can load the compilation environment and compile the code.

"},{"location":"user-guide/dev-environment-4cab/#build-instructions-for-software-on-archer2","title":"Build instructions for software on ARCHER2","text":"

The ARCHER2 CSE team at EPCC and other contributors provide build configurations ando instructions for a range of research software, software libraries and tools on a variety of HPC systems (including ARCHER2) in a public Github repository. See:

The repository always welcomes contributions from the ARCHER2 user community.

"},{"location":"user-guide/dev-environment-4cab/#support-for-building-software-on-archer2","title":"Support for building software on ARCHER2","text":"

If you run into issues building software on ARCHER2 or the software you require is not available then please contact the ARCHER2 Service Desk with any questions you have.

"},{"location":"user-guide/dev-environment/","title":"Application development environment","text":""},{"location":"user-guide/dev-environment/#whats-available","title":"What's available","text":"

ARCHER2 runs the HPE Cray Linux Environment (a version of SUSE Linux), and provides a development environment which includes:

Access to particular software, and particular versions, is managed by an Lmod module framework. Most software is available by loading modules, including the different compiler environments

You can see what compiler environments are available with:

auser@uan01:~> module avail PrgEnv\n\n--------------------------------------- /opt/cray/pe/lmod/modulefiles/core ----------------------------------------\n   PrgEnv-aocc/8.3.3    PrgEnv-cray/8.3.3 (L)    PrgEnv-gnu/8.3.3\n\n  Where:\n   L:  Module is loaded\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n

Other software modules can be searched using the module spider command:

auser@uan01:~> module spider\n\n---------------------------------------------------------------------------------------------------------------\nThe following is a list of the modules and extensions currently available:\n---------------------------------------------------------------------------------------------------------------\n  PrgEnv-aocc: PrgEnv-aocc/8.3.3\n\n  PrgEnv-cray: PrgEnv-cray/8.3.3\n\n  PrgEnv-gnu: PrgEnv-gnu/8.3.3\n\n  amd-uprof: amd-uprof/3.6.449\n\n  aocc: aocc/3.2.0\n\n  aocc-mixed: aocc-mixed/3.2.0\n\n  aocl: aocl/3.1, aocl/4.0\n\n  forge: forge/24.0\n\n  atp: atp/3.14.16\n\n  bolt: bolt/0.7, bolt/0.8\n\n  boost: boost/1.72.0, boost/1.81.0\n\n  castep: castep/22.11\n\n  cce: cce/15.0.0\n\n...output trimmed...\n

A full discussion of the module system is available in the Software environment section.

A consistent set of modules is loaded on login to the machine (currently PrgEnv-cray, see below). Developing applications then means selecting and loading the appropriate set of modules before starting work.

This section is aimed at code developers and will concentrate on the compilation environment, building libraries and executables, specifically parallel executables. Other topics such as Python and Containers are covered in more detail in separate sections of the documentation.

Tip

If you want to get back to the login module state without having to logout and back in again, you can just use:

module restore\n
This is also handy for build scripts to ensure you are starting from a known state.

"},{"location":"user-guide/dev-environment/#compiler-environments","title":"Compiler environments","text":"

There are three different compiler environments available on ARCHER2:

The current compiler suite is selected via the PrgEnv module , while the specific compiler versions are determined by the relevant compiler module. A summary is:

Suite name Compiler Environment Module Compiler Version Module CCE PrgEnv-cray cce GCC PrgEnv-gnu gcc AOCC PrgEnv-aocc aocc

For example, at login, the default set of modules are:

auser@ln03:~> module list\n\n  1) craype-x86-rome                         6) cce/15.0.0             11) PrgEnv-cray/8.3.3\n  2) libfabric/1.12.1.2.2.0.0                7) craype/2.7.19          12) bolt/0.8\n  3) craype-network-ofi                      8) cray-dsmml/0.2.2       13) epcc-setup-env\n  4) perftools-base/22.12.0                  9) cray-mpich/8.1.23      14) load-epcc-module\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-libsci/22.12.1.1\n

from which we see the default compiler environment is Cray (indicated by PrgEnv-cray (at 11 in the list above) and the default compiler module is cce/15.0.0 (at 6 in the list above). The compiler environment will give access to a consistent set of compiler, MPI library via cray-mpich (at 9), and other libraries e.g., cray-libsci (at 10 in the list above).

"},{"location":"user-guide/dev-environment/#switching-between-compiler-environments","title":"Switching between compiler environments","text":"

Switching between different compiler environments is achieved using the module load command. For example, to switch from the default HPE Cray (CCE) compiler environment to the GCC environment, you would use:

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n

If you then use the module list command, you will see that your environment has been changed to the GCC environment:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) bolt/0.8          11) cray-dsmml/0.2.2\n  2) libfabric/1.12.1.2.2.0.0                7) epcc-setup-env    12) cray-mpich/8.1.23\n  3) craype-network-ofi                      8) load-epcc-module  13) cray-libsci/22.12.1.1\n  4) perftools-base/22.12.0                  9) gcc/11.2.0        14) PrgEnv-gnu/8.3.3\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) craype/2.7.19\n
"},{"location":"user-guide/dev-environment/#switching-between-compiler-versions","title":"Switching between compiler versions","text":"

Within a given compiler environment, it is possible to swap to a different compiler version by swapping the relevant compiler module. To switch to the GNU compiler environment from the default HPE Cray compiler environment and than swap the version of GCC from the 11.2.0 default to the older 10.3.0 version, you would use

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n\nauser@ln03:~> module load gcc/10.3.0\n\nThe following have been reloaded with a version change:\n  1) gcc/11.2.0 => gcc/10.3.0\n

The first swap command moves to the GNU compiler environment and the second swap command moves to the older version of GCC. As before, module list will show that your environment has been changed:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) bolt/0.8          11) cray-libsci/22.12.1.1\n  2) libfabric/1.12.1.2.2.0.0                7) epcc-setup-env    12) PrgEnv-gnu/8.3.3\n  3) craype-network-ofi                      8) load-epcc-module  13) gcc/10.3.0\n  4) perftools-base/22.12.0                  9) craype/2.7.19     14) cray-mpich/8.1.23\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-dsmml/0.2.2\n
"},{"location":"user-guide/dev-environment/#compiler-wrapper-scripts-cc-cc-ftn","title":"Compiler wrapper scripts: cc, CC, ftn","text":"

To ensure consistent behaviour, compilation of C, C++, and Fortran source code should then take place using the appropriate compiler wrapper: cc, CC, and ftn, respectively. The wrapper will automatically call the relevant underlying compiler and add the appropriate include directories and library locations to the invocation. This typically eliminates the need to specify this additional information explicitly in the configuration stage. To see the details of the exact compiler invocation use the -craype-verbose flag to the compiler wrapper.

The default link time behaviour is also related to the current programming environment. See the section below on Linking and libraries.

Users should not, in general, invoke specific compilers at compile/link stages. In particular, gcc, which may default to /usr/bin/gcc, should not be used. The compiler wrappers cc, CC, and ftn should be used (with the underlying compiler type and version set by the module system). Other common MPI compiler wrappers e.g., mpicc, should also be replaced by the relevant wrapper, e.g. cc (commands such as mpicc are not available on ARCHER2).

Important

Always use the compiler wrappers cc, CC, and/or ftn and not a specific compiler invocation. This will ensure consistent compile/link time behaviour.

Tip

If you are using a build system such as Make or CMake then you will need to replace all occurrences of mpicc with cc, mpicxx/mpic++ with CC and mpif90 with ftn.

"},{"location":"user-guide/dev-environment/#compiler-man-pages-and-help","title":"Compiler man pages and help","text":"

Further information on both the compiler wrappers, and the individual compilers themselves are available via the command line, and via standard man pages. The man page for the compiler wrappers is common to all programming environments, while the man page for individual compilers depends on the currently loaded programming environment. The following table summarises options for obtaining information on the compiler and compile options:

Compiler suite C C++ Fortran Cray man clang man clang++ man crayftn GNU man gcc man g++ man gfortran Wrappers man cc man CC man ftn

Tip

You can also pass the --help option to any of the compilers or wrappers to get a summary of how to use them. The Cray Fortran compiler uses ftn --craype-help to access the help options.

Tip

There are no man pages for the AOCC compilers at the moment.

Tip

Cray C/C++ is based on Clang and therefore supports similar options to clang/gcc. clang --help will produce a full summary of options with Cray-specific options marked \"Cray\". The clang man page on ARCHER2 concentrates on these Cray extensions to the clang front end and does not provide an exhaustive description of all clang options. Cray Fortran is not based on Flang and so takes different options from flang/gfortran.

"},{"location":"user-guide/dev-environment/#which-compiler-environment","title":"Which compiler environment?","text":"

If you are unsure which compiler you should choose, we suggest the starting point should be the GNU compiler collection (GCC, PrgEnv-gnu); this is perhaps the most commonly used by code developers, particularly in the open source software domain. A portable, standard-conforming code should (in principle) compile in any of the three compiler environments.

For users requiring specific compiler features, such as coarray Fortran, the recommended starting point would be Cray. The following sections provide further details of the different compiler environments.

Warning

Intel compilers are not currently available on ARCHER2.

"},{"location":"user-guide/dev-environment/#gnu-compiler-collection-gcc","title":"GNU compiler collection (GCC)","text":"

The commonly used open source GNU compiler collection is available and provides C/C++ and Fortran compilers.

Switch the the GCC compiler environment from the default CCE (cray) compiler environment via:

auser@ln03:~> module load PrgEnv-gnu\n\nLmod is automatically replacing \"cce/15.0.0\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-gnu/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n

Warning

If you want to use GCC version 10 or greater to compile Fortran code, with the old MPI interfaces (i.e. use mpi or INCLUDE 'mpif.h') you must add the -fallow-argument-mismatch option (or equivalent) when compiling otherwise you will see compile errors associated with MPI functions. The reason for this is that past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines using the old MPI interfaces where arrays of different types are passed to, for example, MPI_Send(). This will now generate an error as not standard conforming. The -fallow-argument-mismatch option is used to reduce the error to a warning. The same effect may be achieved via -std=legacy.

If you use the Fortran 2008 MPI interface (i.e. use mpi_f08) then you should not need to add this option.

Fortran language MPI bindings are described in more detail at in the MPI Standard documentation.

"},{"location":"user-guide/dev-environment/#useful-gnu-fortran-options","title":"Useful Gnu Fortran options","text":"Option Comment -O<level> Optimisation levels: -O0, -O1, -O2, -O3, -Ofast. -Ofast is not recommended without careful regression testing on numerical output. -std=<standard> Default is gnu -fallow-argument-mismatch Allow mismatched procedure arguments. This argument is required for compiling MPI Fortran code with GCC version 10 or greater if you are using the older MPI interfaces (see warning above) -fbounds-check Use runtime checking of array indices -fopenmp Compile OpenMP (default is no OpenMP) -v Display verbose output from compiler stages

Tip

The standard in -std may be one of f95 f2003, f2008 or f2018. The default option -std=gnu is the latest Fortran standard plus gnu extensions.

Warning

Past versions of gfortran have allowed mismatched arguments to external procedures (e.g., where an explicit interface is not available). This is often the case for MPI routines where arrays of different types are passed to MPI_Send() and so on. This will now generate an error as not standard conforming. Use -fallow-argument-mismatch to reduce the error to a warning. The same effect may be achieved via -std=legacy.

"},{"location":"user-guide/dev-environment/#using-gcc-12x-on-archer2","title":"Using GCC 12.x on ARCHER2","text":"

GCC 12.x compilers are available on ARCHER2 for users who wish to access newer features (particularly C++ features).

Testing by the CSE service has identified that some software regression tests produce different results from the reference values when using software compiled with gfortran from GCC 12.x so we do not recommend its general use by users. Users should carefully check results from software built using compilers from GCC 12.x before using it for their research projects.

You can access GCC 12.x by using the commands:

module load extra-compilers\nmodule load PrgEnv-gnu\n
"},{"location":"user-guide/dev-environment/#reference-material","title":"Reference material","text":""},{"location":"user-guide/dev-environment/#cray-compiling-environment-cce","title":"Cray Compiling Environment (CCE)","text":"

The Cray Compiling Environment (CCE) is the default compiler at the point of login. CCE supports C/C++ (along with unified parallel C UPC), and Fortran (including co-array Fortran). Support for OpenMP parallelism is available for both C/C++ and Fortran (currently OpenMP 4.5, with a number of exceptions).

The Cray C/C++ compiler is based on a clang front end, and so compiler options are similar to those for gcc/clang. However, the Fortran compiler remains based around Cray-specific options. Be sure to separate C/C++ compiler options and Fortran compiler options (typically CFLAGS and FFLAGS) if compiling mixed C/Fortran applications.

As CCE is the default compiler environment on ARCHER2, you do not usually need to issue any commands to enable CCE.

Note

The CCE Clang compiler uses a GCC 8 toolchain so only C++ standard library features available in GCC 8 will be available in CCE Clang. You can add the compile option --gcc-toolchain=/opt/gcc/11.2.0/snos to use a more recent version of the C++ standard library if you wish.

"},{"location":"user-guide/dev-environment/#useful-cce-cc-options","title":"Useful CCE C/C++ options","text":"

When using the compiler wrappers cc or CC, some of the following options may be useful:

Language, warning, Debugging options:

Option Comment -std=<standard> Default is -std=gnu11 (gnu++14 for C++) [1] --gcc-toolchain=/opt/cray/pe/gcc/12.2.0/snos Use the GCC 12.2.0 toolchain instead of the default 11.2.0 version packaged with CCE

Performance options:

Option Comment -Ofast Optimisation levels: -O0, -O1, -O2, -O3, -Ofast. -Ofast is not recommended without careful regression testing on numerical output. -ffp=level Floating point maths optimisations levels 0-4 [2] -flto Link time optimisation

Miscellaneous options:

Option Comment -fopenmp Compile OpenMP (default is off) -v Display verbose output from compiler stages

Notes

  1. Option -std=gnu11 gives c11 plus GNU extensions (likewise c++14 plus GNU extensions). See https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/C-Extensions.html
  2. Option -ffp=3 is implied by -Ofast or -ffast-math
"},{"location":"user-guide/dev-environment/#useful-cce-fortran-options","title":"Useful CCE Fortran options","text":"

Language, Warning, Debugging options:

Option Comment -m <level> Message level (default -m 3 errors and warnings)

Performance options:

Option Comment -O <level> Optimisation levels: -O0 to -O3 (default -O2) -h fp<level> Floating point maths optimisations levels 0-3 -h ipa Inter-procedural analysis

Miscellaneous options:

Option Comment -h omp Compile OpenMP (default is -hnoomp) -v Display verbose output from compiler stages"},{"location":"user-guide/dev-environment/#cce-reference-documentation","title":"CCE Reference Documentation","text":""},{"location":"user-guide/dev-environment/#amd-optimizing-compiler-collection-aocc","title":"AMD Optimizing Compiler Collection (AOCC)","text":"

The AMD Optimizing Compiler Collection (AOCC) is a clang-based optimising compiler. AOCC also includes a flang-based Fortran compiler.

Load the AOCC compiler environment from the default CCE (cray) compiler environment via:

auser@ln03:~> module load PrgEnv-aocc\n\nLmod is automatically replacing \"cce/15.0.0\" with \"aocc/3.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.3.3\" with \"PrgEnv-aocc/8.3.3\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.23\n
"},{"location":"user-guide/dev-environment/#aocc-reference-material","title":"AOCC reference material","text":""},{"location":"user-guide/dev-environment/#message-passing-interface-mpi","title":"Message passing interface (MPI)","text":""},{"location":"user-guide/dev-environment/#hpe-cray-mpich","title":"HPE Cray MPICH","text":"

HPE Cray provide, as standard, an MPICH implementation of the message passing interface which is specifically optimised for the ARCHER2 interconnect. The current implementation supports MPI standard version 3.1.

The HPE Cray MPICH implementation is linked into software by default when compiling using the standard wrapper scripts: cc, CC and ftn.

You do not need to do anything to make HPE Cray MPICH available when you log into ARCHER2, it is available by default to all users.

"},{"location":"user-guide/dev-environment/#switching-to-alternative-ucx-mpi-implementation","title":"Switching to alternative UCX MPI implementation","text":"

HPE Cray MPICH can use two different low-level protocols to transfer data across the network. The default is the Open Fabrics Interface (OFI), but you can switch to the UCX protocol from Mellanox.

Which performs better will be application-dependent, but our experience is that UCX is often faster for programs that send a lot of data collectively between many processes, e.g. all-to-all communications patterns such as occur in parallel FFTs.

Note

You do not need to recompile your program - you simply load different modules in your Slurm script.

module load craype-network-ucx \nmodule load cray-mpich-ucx \n

Important

If your software was compiled using a compiler environment other then CCE you will also need to load that compiler environment as well as the UCX modules. For example, if you compiled using PrgEnv-gnu you would need to:

module load PrgEnv-gnu\nmodule load craype-network-ucx \nmodule load cray-mpich-ucx \n

The performance benefits will also vary depending on the number of processes, so it is important to benchmark your application at the scale used in full production runs.

"},{"location":"user-guide/dev-environment/#mpi-reference-material","title":"MPI reference material","text":"

MPI standard documents: https://www.mpi-forum.org/docs/

"},{"location":"user-guide/dev-environment/#linking-and-libraries","title":"Linking and libraries","text":"

Linking to libraries is performed dynamically on ARCHER2.

Important

Static linking is not supported on ARCHER2. If you attempt to link statically, you will see errors similar to:

/usr/bin/ld: cannot find -lpmi\n/usr/bin/ld: cannot find -lpmi2\ncollect2: error: ld returned 1 exit status\n

One can use the -craype-verbose flag to the compiler wrapper to check exactly what linker arguments are invoked. The compiler wrapper scripts encode the paths to the programming environment system libraries using RUNPATH. This ensures that the executable can find the correct runtime libraries without the matching software modules loaded.

The library RUNPATH associated with an executable can be inspected via, e.g.,

$ readelf -d ./a.out\n

(swap a.out for the name of the executable you are querying).

"},{"location":"user-guide/dev-environment/#commonly-used-libraries","title":"Commonly used libraries","text":"

Modules with names prefixed by cray- are provided by HPE Cray, and work with any of the compiler environments and. These modules should be the first choice for access to software libraries if available.

Tip

More information on the different software libraries on ARCHER2 can be found in the Software libraries section of the user guide.

"},{"location":"user-guide/dev-environment/#hpe-cray-programming-environment-cpe-releases","title":"HPE Cray Programming Environment (CPE) releases","text":""},{"location":"user-guide/dev-environment/#available-hpe-cray-programming-environment-cpe-releases","title":"Available HPE Cray Programming Environment (CPE) releases","text":"

ARCHER2 currently has the following HPE Cray Programming Environment (CPE) releases available:

You can find information, notes, and lists of changes for current and upcoming ARCHER2 HPE Cray programming environments in the HPE Cray Programming Environment GitHub repository.

Tip

We recommend that users use the most recent version of the PE available to get the latest improvements and bug fixes.

Later PE releases may sometimes be available via a containerised form. This allows developers to check that their code compiles and runs using CPE releases that have not yet been installed on ARCHER2.

CPE 23.12 is currently available as a Singularity container, see Using Containerised HPE Cray Programming Environments for further details.

"},{"location":"user-guide/dev-environment/#switching-to-a-different-hpe-cray-programming-environment-cpe-release","title":"Switching to a different HPE Cray Programming Environment (CPE) release","text":"

Important

See the section below on using non-default versions of HPE Cray libraries as this process will generally need to be followed when using software from non-default PE installs.

Access to non-default PE environments is controlled by the use of the cpe modules. Loading a cpe module will do the following:

For example, if you have a code that uses the Gnu compiler environment, FFTW and NetCDF parallel libraries and you want to compile in the (non-default) 22.04 programming environment, you would do the following:

First, load the cpe/23.09 module to switch all the defaults to the versions from the 22.04 PE. Then, swap to the GNU compiler environment and load the required library modules (FFTW, hdf5-parallel and NetCDF HDF5 parallel). The loaded module list shows they are the versions from the 22.04 PE:

module load cpe/23.09\n

Output:

The following have been reloaded with a version change:\n  1) PrgEnv-cray/8.3.3 => PrgEnv-cray/8.4.0             4) cray-mpich/8.1.23 => cray-mpich/8.1.27\n  2) cce/15.0.0 => cce/16.0.1                           5) craype/2.7.19 => craype/2.7.23\n  3) cray-libsci/22.12.1.1 => cray-libsci/23.09.1.1     6) perftools-base/22.12.0 => perftools-base/23.09.0\n

module load PrgEnv-gnu\n
Output:
Lmod is automatically replacing \"cce/16.0.1\" with \"gcc/11.2.0\".\n\n\nLmod is automatically replacing \"PrgEnv-cray/8.4.0\" with \"PrgEnv-gnu/8.4.0\".\n\n\nDue to MODULEPATH changes, the following have been reloaded:\n  1) cray-mpich/8.1.27\n

module load cray-fftw\nmodule load cray-hdf5-parallel\nmodule load cray-netcdf-hdf5parallel\nmodule list\n

Output:

Currently Loaded Modules:\n  1) craype-x86-rome                         6) epcc-setup-env          11) craype/2.7.23          16) cray-fftw/3.3.10.5\n  2) libfabric/1.12.1.2.2.0.0                7) load-epcc-module        12) cray-dsmml/0.2.2       17) cray-hdf5-parallel/1.12.2.7\n  3) craype-network-ofi                      8) perftools-base/23.09.0  13) cray-mpich/8.1.27      18) cray-netcdf-hdf5parallel/4.9.0.7\n  4) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta   9) cpe/23.09               14) cray-libsci/23.09.1.1\n  5) bolt/0.8                               10) gcc/11.2.0              15) PrgEnv-gnu/8.4.0\n

Now you can go ahead and compile your software with the new programming environment.

Important

The cpe modules only change the versions of software modules provided as part of the HPE Cray programming environments. Any modules provided by the ARCHER2 service will need to be loaded manually after you have completed the process described above.

Note

Unloading the cpe module does not restore the original programming environment release. To restore the default programming environment release you should log out and then log back in to ARCHER2.

"},{"location":"user-guide/dev-environment/#using-non-default-versions-of-hpe-cray-libraries","title":"Using non-default versions of HPE Cray libraries","text":"

If you wish to make use of non-default versions of libraries provided by HPE Cray (usually because they are part of a non-default PE release: either old or new) then you need to make changes at both compile and runtime. In summary, you need to load the correct module and also make changes to the LD_LIBRARY_PATH environment variable.

At compile time you need to load the version of the library module before you compile and set the LD_LIBRARY_PATH environment variable to include the contencts of $CRAY_LD_LIBRARY_PATH as the first entry. For example, to use the, non-default, 23.09.1.1 version of HPE Cray LibSci in the default programming environment (Cray Compiler Environment, CCE) you would first setup the environment to compile with:

module load cray-libsci/23.09.1.1\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n

The order is important here: every time you change a module, you will need to reset the value of LD_LIBRARY_PATH for the process to work (it will not be updated automatically).

Now you can compile your code. You can check that the executable is using the correct version of LibSci with the ldd command and look for the line beginning libsci_cray.so.5, you should see the version in the path to the library file:

ldd dgemv.x \n

Output:

    linux-vdso.so.1 (0x00007ffc7fff5000)\n    libm.so.6 => /lib64/libm.so.6 (0x00007fd6a6361000)\n    libsci_cray.so.5 => /opt/cray/pe/libsci/23.09.1.1/CRAY/12.0/x86_64/lib/libsci_cray.so.5 (0x00007fd6a2419000)\n    libdl.so.2 => /lib64/libdl.so.2 (0x00007fd6a2215000)\n    libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00007fd6a68b3000)\n    libquadmath.so.0 => /opt/cray/pe/gcc-libs/libquadmath.so.0 (0x00007fd6a1fce000)\n    libmodules.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libmodules.so.1 (0x00007fd6a689a000)\n    libfi.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libfi.so.1 (0x00007fd6a1a29000)\n    libcraymath.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libcraymath.so.1 (0x00007fd6a67b3000)\n    libf.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libf.so.1 (0x00007fd6a6720000)\n    libu.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libu.so.1 (0x00007fd6a1920000)\n    libcsup.so.1 => /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libcsup.so.1 (0x00007fd6a6715000)\n    libc.so.6 => /lib64/libc.so.6 (0x00007fd6a152b000)\n    /lib64/ld-linux-x86-64.so.2 (0x00007fd6a66ac000)\n    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd6a1308000)\n    librt.so.1 => /lib64/librt.so.1 (0x00007fd6a10ff000)\n    libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00007fd6a0c53000)\n    libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00007fd6a0841000)\n    libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00007fd6a0628000)\n

Tip

If any of the libraries point to versions in the /opt/cray/pe/lib64 directory then these are using the default versions of the libraries rather than the specific versions. This happens at compile time if you have forgotton to load the right module and set $LD_LIBRARY_PATH afterwards.

At run time (typically in your job script) you need to repeat the environment setup steps (you can also use the ldd command in your job submission script to check the library is pointing to the correct version). For example, a job submission script to run our dgemv.x executable with the non-default version of LibSci could look like:

#!/bin/bash\n#SBATCH --job-name=dgemv\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=short\n#SBATCH --reservation=shortqos\n\n# Setup up the environment to use the non-default version of LibSci\nmodule load cray-libsci/23.09.1.1\nexport LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH\n\n# Check which library versions the executable is pointing too\nldd dgemv.x\n\nexport OMP_NUM_THREADS=1\n\nsrun --hint=nomultithread --distribution=block:block dgemv.x\n

Tip

As when compiling, the order of commands matters. Setting the value of LD_LIBRARY_PATH must happen after you have finished all your module commands for it to have the correct effect.

Important

You must setup the environment at both compile and run time otherwise you will end up using the default version of the library.

"},{"location":"user-guide/dev-environment/#compiling-on-compute-nodes","title":"Compiling on compute nodes","text":"

Sometimes you may wish to compile in a batch job. For example, the compile process may take a long time or the compile process is part of the research workflow and can be coupled to the production job. Unlike login nodes, the /home file system is not available.

An example job submission script for a compile job using make (assuming the Makefile is in the same directory as the job submission script) would be:

#!/bin/bash\n\n#SBATCH --job-name=compile\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace the account code, partition and QoS with those you wish to use\n#SBATCH --account=t01        \n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n\nmake clean\n\nmake\n

Note

If you want to use a compiler environment other than the default then you will need to add the module load command before the make command. e.g. to use the GCC compiler environemnt:

module load PrgEnv-gnu\n

You can also use a compute node in an interactive way using salloc. Please see Section Using salloc to reserve resources for further details. Once your interactive session is ready, you can load the compilation environment and compile the code.

"},{"location":"user-guide/dev-environment/#using-the-compiler-wrappers-for-serial-compilations","title":"Using the compiler wrappers for serial compilations","text":"

The compiler wrappers link with a number of HPE-provided libraries automatically. It is possible to compile codes in serial with the compiler wrappers to take advantage of the HPE libraries.

To set up your environment for serial compilation, you will need to run:

  module load craype-network-none\n  module remove cray-mpich\n

Once this is done, you can use the compiler wrappers (cc for C, CC for C++, and ftn for Fortran) to compile your code in serial.

"},{"location":"user-guide/dev-environment/#managing-development","title":"Managing development","text":"

ARCHER2 supports common revision control software such as git.

Standard GNU autoconf tools are available, along with make (which is GNU Make). Versions of cmake are available.

Tip

Some of these tools are part of the system software, and typically reside in /usr/bin, while others are provided as part of the module system. Some tools may be available in different versions via both /usr/bin and via the module system. If you find the default version is too old, then look in the module system for a more recent version.

"},{"location":"user-guide/dev-environment/#build-instructions-for-software-on-archer2","title":"Build instructions for software on ARCHER2","text":"

The ARCHER2 CSE team at EPCC and other contributors provide build configurations ando instructions for a range of research software, software libraries and tools on a variety of HPC systems (including ARCHER2) in a public Github repository. See:

The repository always welcomes contributions from the ARCHER2 user community.

"},{"location":"user-guide/dev-environment/#support-for-building-software-on-archer2","title":"Support for building software on ARCHER2","text":"

If you run into issues building software on ARCHER2 or the software you require is not available then please contact the ARCHER2 Service Desk with any questions you have.

"},{"location":"user-guide/energy/","title":"Energy use","text":"

This section covers how to monitor energy use for your jobs on ARCHER2 and how to control the CPU frequency which allows some control over how much energy is consumed by jobs.

Important

The default CPU frequency cap on ARCHER2 compute nodes for jobs launched using srun is currently set to 2.0 GHz. Information below describes how to control the CPU frequency cap using Slurm.

"},{"location":"user-guide/energy/#monitoring-energy-use","title":"Monitoring energy use","text":"

The Slurm accounting database stores the total energy consumed by a job and you can also directly access the counters on compute nodes which capture instantaneous power and energy data broken down by different hardware components.

"},{"location":"user-guide/energy/#using-sacct-to-get-energy-usage-for-individual-jobs","title":"Using sacct to get energy usage for individual jobs","text":"

Energy usage for a particular job may be obtained using the sacct command. For instance

sacct -j 2658300 --format=JobID,Elapsed,ReqCPUFreq,ConsumedEnergy\n

will provide the elapsed time and consumed energy in joules for the job(s) specified with -j. The output of this command is:

JobID           Elapsed ReqCPUFreq ConsumedEnergy \n------------ ---------- ---------- -------------- \n2658300        02:19:48    Unknown          4.58M \n2658300.bat+   02:19:48          0          4.58M \n2658300.ext+   02:19:48          0          4.58M \n2658300.0      02:19:09    Unknown          4.57M \n

In this case we can see that the job consumed 4.58 MJ for a run lasting 2 hours, 19 minutes and 48 seconds with the CPU frequency unset. To convert the energy to kWh we can multiply the energy in joules by 2.78e-7, in this case resulting in 1.27 kWh.

The Slurm database may be cleaned without notice so you should gather any data you want as soon as possible after the job completes - you can even add the sacct command to the end of your job script to ensure this data is captured.

In addition to energy statistics sacct provides a number of other statistics that can be specified to the --format option, the full list of which can be viewed with

sacct --helpformat\n

or using the man pages.

"},{"location":"user-guide/energy/#accessing-the-node-energypower-counters","title":"Accessing the node energy/power counters","text":"

Note

The counters are available on each compute node and record data only for that compute node. If you are running multi-node jobs, you will need to combine data from multiple nodes to get data for the whole job.

On compute nodes, the raw energy counters and instantaneous power draw data are available at:

/sys/cray/pm_counters\n

There are a number of files in this directory, all the counter files include the current value and a timestamp.

This documentation is from the official HPE documentation:

Tip

The overall power and energy counters include all on-node systems. The major components are the CPU (processor), memory and Slingshot network interface controller (NIC).

Note

There exists an MPI-based wrapper library that can gather the pm counter values at runtime via a simple set of function calls. See the link below for details.

"},{"location":"user-guide/energy/#controlling-cpu-frequency","title":"Controlling CPU frequency","text":"

You can request specific CPU frequency caps (in kHz) for compute nodes through srun options or environment variables. The available frequency caps on the ARCHER2 processors along with the options and environment variables:

Frequency srun option Slurm environment variable Turbo boost enabled? 2.25 GHz --cpu-freq=2250000 export SLURM_CPU_FREQ_REQ=2250000 Yes 2.00 GHz --cpu-freq=2000000 export SLURM_CPU_FREQ_REQ=2000000 No 1.50 GHz --cpu-freq=1500000 export SLURM_CPU_FREQ_REQ=1500000 No

The only frequency caps available on the processors on ARCHER2 are 1.5 GHz, 2.0 GHz and 2.25GHz+turbo.

Important

Setting the CPU frequency cap in this way sets the maximum frequency that the processors can use. In practice, the individual cores may select different frequencies up to the value you have set depending on the workload on the processor.

Important

When you select the highest frequency value (2.25 GHz), you also enable turbo boost and so the processor is free to set the CPU frequency to values above 2.25 GHz if possible within the power and thermal limits of the processor. We see that, with turbo boost enabled, the processors typically boost to around 2.8 GHz even when performing compute-intensive work.

For example, you can add the following option to srun commands in your job submission scripts to set the CPU frequency to 2.25 GHz (and also enable turbo boost):

srun --cpu-freq=2250000 ...usual srun options and arguments...\n

Alternatively, you could add the following line to your job submission script before you use srun to launch the application:

export SLURM_CPU_FREQ_REQ=2250000\n

Tip

Testing by the ARCHER2 CSE team has shown that most software are most energy efficient when 2.0 GHz is selected as the CPU frequency.

Important

The CPU frequency settings only affect applications launched using the srun command.

Priority of frequency settings:

Tip

Adding the --cpu-freq=<freq in kHz> option to sbatch (e.g. using #SBATCH --cpu-freq=<freq in kHz> will not change the CPU frequency of srun commands used in the job as the default setting for ARCHER2 will override the sbatch option when the script runs.

"},{"location":"user-guide/energy/#default-cpu-frequency","title":"Default CPU frequency","text":"

If you do not specify a CPU frequency then you will get the default setting for the ARCHER2 service when you lanch an application using srun. The table below lists the history of default CPU frequency settings on the ARCHER2 service

Date range Default CPU frequency 12 Dec 2022 - current date 2.0 GHz Nov 2021 - 11 Dec 2022 Unspecified - defaults to 2.25 GHz"},{"location":"user-guide/energy/#slurm-cpu-frequency-settings-for-centrally-installed-software","title":"Slurm CPU frequency settings for centrally-installed software","text":"

Most centrally installed research software (available via module load commands) uses the same default Slurm CPU frequency as set globally for all ARCHER2 users (see above for this value). However, a small number of software have performance that is significantly degraded by using lower frequency settings and so the modules for these packages reset the CPU frequency to the highest value (2.25 GHz). The packages that currently do this are:

Important

If you specify the Slurm CPU frequency in your job scripts using one of the mechanisms described above after you have loaded the module, you will override the setting from the module.

"},{"location":"user-guide/functional-accounts/","title":"Functional accounts on ARCHER2","text":"

Functional accounts are used to enable persistent services, controlled by users running on ARCHER2. For example, running a licence server to allow jobs on compute nodes to check out a licence for restricted software.

There are a number of steps involved in setting up functional accounts:

  1. Submit a request to service desk for review and award of functional account entitlement
  2. Creation of the functional account and associating authorisation for your standard ARCHER2 account to access it
  3. Test that you can access the persistent service node (dvn04) and the functional account
  4. Setup of the persistent service on the persistent service node (dvn04)

We cover these steps in detail below with the concrete example of setting up a licence server using the FlexLM software but the process should be able to be generalised for other persistent services.

Note

If you have any questions about functional accounts and persistent services on ARCHER2 please contact the ARCHER2 Service Desk.

"},{"location":"user-guide/functional-accounts/#submit-a-request-to-service-desk","title":"Submit a request to service desk","text":"

If you wish to have access to a functional account for persistent services on ARCHER2 you should email the ARCHER2 Service Desk with a case for why you want to have this functionality. You should include the following information in your email:

"},{"location":"user-guide/functional-accounts/#creation-of-the-functional-account","title":"Creation of the functional account","text":"

If your request for a functional account is approved then the ARCHER2 user administration team will setup the account and enable access for the standard user accounts named in the application. They will then inform you of the functional account name.

"},{"location":"user-guide/functional-accounts/#test-access-to-functional-account","title":"Test access to functional account","text":"

The process for accessing the functional account is:

  1. Log into ARCHER2 using normal user account
  2. Setup an SSH key pair for login access to persistent service node (dvn04)
  3. Log into persistent service node (dvn04)
  4. Use sudo to access the functional account
"},{"location":"user-guide/functional-accounts/#login-to-archer2","title":"Login to ARCHER2","text":"

Log into ARCHER2 in the usual way using a normal user account that has been given access to manage the functional account.

"},{"location":"user-guide/functional-accounts/#setup-ssh-key-pair-for-dvn04-access","title":"Setup SSH key pair for dvn04 access","text":"

You can create a passphrase-less SSH key pair to use for access to the persistent service node using the ssh-keygen command. As long as you place the public and private key parts in the default location, you will not need any additional SSH options to access dvn04 from the ARCHER2 login nodes. Just hit enter when prompted for a passphrase to create a key with no passphrase.

Once the key pair has been created, you add the public part to the $HOME/.ssh/authorized_keys file on ARCHER2 to make it valid for login to dvn04 using the command cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys.

Example commands to setup SSH key pair:

auser@ln04:~> ssh-keygen -t rsa\n\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/home/t01/t01/auser/.ssh/id_rsa): \nEnter passphrase (empty for no passphrase): \nEnter same passphrase again: \nYour identification has been saved in /home/t01/t01/auser/.ssh/id_rsa\nYour public key has been saved in /home/t01/t01/auser/.ssh/id_rsa.pub\nThe key fingerprint is:\nSHA256:wX2bgNElbsPaT8HXKIflNmqnjSfg7a8BPM1R56b4/60 auser@ln02\nThe key's randomart image is:\n+---[RSA 3072]----+\n|        ..... o .|\n|       . *.o = = |\n|        + B B B +|\n|         * * % + |\n|        S * X o  |\n|         . O *   |\n|          . B +  |\n|           . + ..|\n|            ooE.=|\n+----[SHA256]-----+\n\nauser@ln04:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys\n
"},{"location":"user-guide/functional-accounts/#login-to-the-persistent-service-node-dvn04","title":"Login to the persistent service node (dvn04)","text":"

Once you are logged into an ARCHER2 login node, and assuming the SSH key is in the default location, you can now login to dvn04:

auser@ln04:~> ssh dvn04\n

Note

You will need to enter the TOTP for your ARCHER2 account to login to dvn04 unless you have logged in to the node recently.

"},{"location":"user-guide/functional-accounts/#access-the-functional-account","title":"Access the functional account","text":"

Once you are logged into dvn04, you use sudo to access the functional account.

Important

You must use the normal user account account password to use the sudo command. This password was set on your first ever login to ARCHER2 (and not used subsequently). If you have forgotten this password, you can reset it in SAFE.

For example, if the functional account is called testlm, you would access it (on dvn04) with:

auser@dvn04:~> sudo -iu testlm\n

To exit the functional account, you use the exit command which will return you to your normal user account on dvn04.

"},{"location":"user-guide/functional-accounts/#setup-the-persistent-service","title":"Setup the persistent service","text":"

You should use systemctl to manage your persistent service on dvn04. In order to use the systemctl command, you need to add the following lines to the ~/.bashrc for the functional account:

export XDG_RUNTIME_DIR=/run/user/$UID\nexport DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus\n

Next, create a service definition file for the persistent service and save it to a plain text file. Here is the example used for the QChem licence server:

[Unit]\nDescription=Licence manger for QChem\nAfter=network.target\nConditionHost=dvn04\n\n[Service]\nType=forking\nExecStart=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\nExecStop=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmutil lmdown -all -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\nSuccessExitStatus=15\nRestart=always\nRestartSec=30\n\n[Install]\nWantedBy=default.target\n

Enable the licence server service, e.g. for the QChem licence server service:

testlm@dvn04:~> systemctl --user enable /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service\n\nCreated symlink /home/y07/y07/testlm/.config/systemd/user/default.target.wants/qchem-lm.service \u2192 /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service.\nCreated symlink /home/y07/y07/testlm/.config/systemd/user/qchem-lm.service \u2192 /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/qchem-lm.service.\n

Once it has been enabled, you can start the licence server service, e.g. for the QChem licence server service:

testlm@dvn04:~> systemctl --user start qchem-lm.service\n

Check the status to make sure it is running:

testlm@dvn04:~> systemctl --user status qchem-lm\n\u25cf qchem-lm.service - Licence manger for QChem\n     Loaded: loaded (/home/y07/y07/testlm/.config/systemd/user/qchem-lm.service; enabled; vendor preset: disabled)\n     Active: active (running) since Thu 2024-05-16 15:33:59 BST; 8s ago\n    Process: 174248 ExecStart=/work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/ (code=exited, status=0/SUCCESS)\n   Main PID: 174249 (lmgrd)\n      Tasks: 8 (limit: 39321)\n     Memory: 5.6M\n        CPU: 18ms\n     CGroup: /user.slice/user-35153.slice/user@35153.service/app.slice/qchem-lm.service\n             \u251c\u2500 174249 /work/y07/shared/apps/core/qchem/6.1/bin/flexnet/lmgrd -l +/work/y07/shared/apps/core/qchem/6.1/var/log/qchemlm.log -c /work/y07/shared/apps/core/qchem/6.1/etc/flexnet/\n             \u2514\u2500 174253 qchemlm -T 10.252.1.77 11.19 10 -c :/work/y07/shared/apps/core/qchem/6.1/etc/flexnet/: -lmgrd_port 6979 -srv mdSVdgushTnAjHX1s1PTj0ppCjHJw1Uk9ylvs1j13zkaUzhDBFlbv4thnqEIAXV --lmgrd_start 66461957 -vdrestart 0 -l /work/y07/shar>\n
"},{"location":"user-guide/gpu/","title":"AMD GPU Development Platform","text":"

In early 2024 ARCHER2 users gained access to a small GPU system integrated into ARCHER2 which is designed to allow users to test and develop software using AMD GPUs.

Important

The GPU component is very small and so is aimed at software development and testing rather than to be used for production research.

"},{"location":"user-guide/gpu/#hardware-available","title":"Hardware available","text":"

The GPU Development Platform consists of 4 compute nodes each with:

"},{"location":"user-guide/gpu/#accessing-the-gpu-compute-nodes","title":"Accessing the GPU compute nodes","text":"

The GPU nodes can be accessed through the Slurm job submission system from the standard ARCHER2 login nodes. Details of the scheduler limits and configuration and example job submission scripts are provided below.

"},{"location":"user-guide/gpu/#compiling-software-for-the-gpu-compute-nodes","title":"Compiling software for the GPU compute nodes","text":""},{"location":"user-guide/gpu/#overview","title":"Overview","text":"

As a quick summary, the recommended procedure for compiling code that offloads to the AMD GPUs is as follows:

For details and alternative approaches, see below.

"},{"location":"user-guide/gpu/#programming-environments","title":"Programming Environments","text":"

The following programming environments and compilers are available to compile code for the AMD GPUs on ARCHER2 using the usual compiler wrappers (ftn, cc, CC), which is the recommended approach:

Programming Environment Description Actual compilers called by ftn, cc, CC PrgEnv-amd AMD LLVM compilers amdflang, amdclang, amdclang++ PrgEnv-cray Cray compilers crayftn, craycc, crayCC PrgEnv-gnu GNU compilers gfortran, gcc, g++ PrgEnv-gnu-amd hybrid gfortran, amdclang, amdclang++ PrgEnv-cray-amd hybrid crayftn, amdclang, amdclang++

To decide which compiler(s) to use to compile offload code for the AMD GPUs, you may find it useful to consult the Compilation Strategies for GPU Offloading section below.

The hybrid environments PrgEnv-gnu-amd and PrgEnv-cray-amd are provided as a convenient way to mitigate less mature OpenMP offload support in the AMD LLVM Fortran compiler. In these hybrid environments ftn therefore calls gfortran or crayftn instead of amdflang.

Details about the underlying compiler being called by a compiler wrapper can be checked using the --version flag, for example:

> module load PrgEnv-amd\n> cc --version\nAMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.3 22324 d6c88e5a78066d5d7a1e8db6c5e3e9884c6ad10e)\nTarget: x86_64-unknown-linux-gnu\nThread model: posix\nInstalledDir: /opt/rocm-5.2.3/llvm/bin\n
"},{"location":"user-guide/gpu/#rocm","title":"ROCm","text":"

Access to AMD's ROCm software stack is provided through the rocm module:

module load rocm\n

With the rocm module loaded the AMD LLVM compilers amdflang, amdclang, and amdclang++ become available to use directly or through AMD's compiler driver utility hipcc. Neither approach is recommended as a first choice for most users, as considerable care needs to be taken to pass suitable flags to the compiler or to hipcc. With PrgEnv-amd loaded the compiler wrappers ftn, cc, CC, which bypass hipcc and call amdflang, amdclang, or amdclang++ directly, take care of passing suitable compilation flags, which is why using these wrappers is the recommended approach for most users, at least initially.

Note: the rocm module should be loaded whenever you are compiling for the AMD GPUs, even if you are not using the AMD LLVM compilers (amdflang, amdclang, amdclang++).

The rocm module also provides access to other AMD tools, such as HIPIFY (hipify-clang or hipify-perl command), which enables translation of CUDA to HIP code. See also the section below on HIPIFY.

"},{"location":"user-guide/gpu/#gpu-target","title":"GPU target","text":"

Regardless of what approach you use, you will need to tell the underlying GPU compiler which GPU hardware to target. When using the compiler wrappers ftn, cc, or CC, as recommended, this can be done by ensuring the appropriate GPU target module is loaded:

module load craype-accel-amd-gfx90a\n
"},{"location":"user-guide/gpu/#cpu-target","title":"CPU target","text":"

The AMD GPU nodes are equipped with AMD EPYC Milan CPUs instead of the AMD EPYC Rome CPUs present on the regular CPU-only ARCHER2 compute nodes. Though the difference between these processors is small, when using the compiler wrappers ftn, cc, or CC, as recommended, we should load the appropriate CPU target module:

module load craype-x86-milan\n
"},{"location":"user-guide/gpu/#compilation-strategies-for-gpu-offloading","title":"Compilation Strategies for GPU Offloading","text":"

Compiler support on ARCHER2 for various programming models that enable offloading to AMD GPUs can be summarised at a glance in the following table:

PrgEnv Actual compiler OpenMP Offload HIP OpenACC PrgEnv-amd amdflang \u2705 \u274c \u274c PrgEnv-amd amdclang \u2705 \u274c \u274c PrgEnv-amd amdclang++ \u2705 \u2705 \u274c PrgEnv-cray crayftn \u2705 \u274c \u2705 PrgEnv-cray craycc \u2705 \u274c \u274c PrgEnv-cray crayCC \u2705 \u2705 \u274c PrgEnv-gnu gfortran \u274c \u274c \u274c PrgEnv-gnu gcc \u274c \u274c \u274c PrgEnv-gnu g++ \u274c \u274c \u274c

It is generally recommended to do the following:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

And then to use the ftn, cc and/or CC wrapper to compile as appropriate for the programming model in question. Specific guidance on how to do this for different programming models is provided in the subsections below.

When deviating from this procedure and using underlying compilers directly, or when debugging a problematic build using the wrappers, it may be useful to check what flags the compiler wrappers are passing to the underlying compiler. This can be done by using the -craype-verbose option with a wrapper when compiling a file. Optionally piping the resulting output to the command tr \" \" \"\\n\" so that flags are split over lines may be convenient for visual parsing. For example:

> CC -craype-verbose source.cpp | tr \" \" \"\\n\"\n
"},{"location":"user-guide/gpu/#openmp-offload","title":"OpenMP Offload","text":"

To use the compiler wrappers to compile code that offloads to GPU with OpenMP directives, first load the desired PrgEnv module and other necessary modules:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

Then use the appropriate compiler wrapper and pass the -fopenmp option to the wrapper when compiling. For example:

ftn -fopenmp source.f90\n

This should work under PrgEnv-amd and PrgEnv-cray, but not under PrgEnv-gnu as GCC 11.2.0 is the most recent version of GCC available on ARCHER2 and OpenMP offload to AMD MI200 series GPUs is only supported by GCC 13 and later.

You may find that offload directives introduced in more recent versions of the OpenMP standard, e.g. versions later than OpenMP 4.5, fail to compile with some compilers. Under PrgEnv-cray an explicit description of supported OpenMP features can be viewed using the command man intro_openmp.

"},{"location":"user-guide/gpu/#hip","title":"HIP","text":"

To compile C or C++ code that uses HIP written specifically to offload to AMD GPUs, first load the desired PrgEnv module (either PrgEnv-amd or PrgEnv-cray) and other necessary modules:

module load PrgEnv-xxx\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

Then compile using the CC compiler wrapper as follows:

CC -x hip -std=c++11 -D__HIP_ROCclr__ --rocm-path=${ROCM_PATH} source.cpp\n

Alternatively, you may use hipcc to drive the AMD LLVM compiler amdclang(++) to compile HIP code. In that case you will need to take care to explicitly pass all required offload flags to hipcc, such as:

-D__HIP_PLATFORM_AMD__ --offload-arch=gfx90a\n

To see what hipcc passes to the compiler, you can pass the --verbose option. If you are compiling MPI-parallel HIP code with hipcc, please see additional guidance under HIPCC and MPI.

hipcc can compile both HIP code for device (GPU) execution and non-HIP code for host (CPU) execution and will default to using the AMD LLVM compiler amdclang(++) to do so. If your software consists of separate compilation units - typically separate files - containing HIP code non-HIP code, it is possible to use a different compiler than hipcc to compile the non-HIP code. To do this:

"},{"location":"user-guide/gpu/#openacc","title":"OpenACC","text":"

Offloading using OpenACC directives on ARCHER2 is only supported by the Cray Fortran compiler. You should therefore load the following:

module load PrgEnv-cray\nmodule load rocm\nmodule load craype-accel-amd-gfx90a\nmodule load craype-x86-milan\n

OpenACC Fortran code can then be compiled using the -hacc flag, as follows:

ftn -hacc source.f90\n

Details on what OpenACC standard and features are supported under PrgEnv-cray can be viewed using the command man intro_openacc.

"},{"location":"user-guide/gpu/#advanced-compilation","title":"Advanced Compilation","text":""},{"location":"user-guide/gpu/#openmp-offload-openmp-cpu-threading","title":"OpenMP Offload + OpenMP CPU threading","text":"

Code may use OpenMP for multithreaded execution on the host CPU in combination with target directives to offload work to GPU. Both uses of OpenMP can coexist in a single compilation unit, which should be compiled using the relevant compiler wrapper and the -fopenmp flag.

"},{"location":"user-guide/gpu/#hip-openmp-offload","title":"HIP + OpenMP Offload","text":"

Using both OpenMP and HIP to offload to GPU is possible, but only if the two programming models are not mixed in the same compilation unit. Two or more separate compilation units - typically separate source files - should be compiled as recommended individually for HIP and OpenMP offload code in the respective sections above. The resulting code objects (.o files) should then be linked together using a compiler wrapper with the -fopenmp flag, but without the -x hip flag.

"},{"location":"user-guide/gpu/#hip-openmp-cpu-threading","title":"HIP + OpenMP CPU threading","text":"

Code in a single compilation unit, such as a single source file, can use HIP to offload to GPU as well as OpenMP for multithreaded execution on the host CPU. Compilation should be done using the relevant compiler wrapper and the flags -fopenmp and \u2013x hip - in that order - as well as the flags for HIP compilation specified above:

CC -fopenmp -x hip -std=c++11 -D__HIP_ROCclr__ --rocm-path=${ROCM_PATH} source.cpp\n
"},{"location":"user-guide/gpu/#hipcc-and-mpi","title":"HIPCC and MPI","text":"

When compiling an MPI-parallel code with hipcc instead of a compiler wrapper, the path to the Cray MPI library include directory should be passed explicitly, or set as part of the CXXFLAGS environment variable, as:

-I${CRAY_MPICH_DIR}/include\n

MPI library directories should also be passed to hipcc, or set as part of the LDFLAGS environment variable prior to compiling, as:

-L${CRAY_MPICH_DIR}/lib ${PE_MPICH_GTL_DIR_amd_gfx90a}\n

Finally the MPI library should be linked explicitly, or set as part of the LIBS environment variable prior to linking, as:

-lmpi ${PE_MPICH_GTL_LIBS_amd_gfx90a}\n
"},{"location":"user-guide/gpu/#cmake","title":"Cmake","text":"

Documentation about integrating rocm with cmake can be found here.

"},{"location":"user-guide/gpu/#gpu-aware-mpi","title":"GPU-aware MPI","text":"

Need to set an environment variable to enable GPU support in cray-mpich:

export MPICH_GPU_SUPPORT_ENABLED=1

No additional or alternative MPI modules need to be loaded instead of the default cray-mpich module.

This supports GPU-GPU transfers:

Be aware that on these nodes there are only two PCIe network cards in each node and they may not be in the same memory region to a given GPU. Therefore NUMA effects are to be expected in multi-node communication. More detail on this is provided below.

"},{"location":"user-guide/gpu/#libraries","title":"Libraries","text":"

In order to access the GPU-accelerated version of Cray's LibSci maths libraries, a new module has been provided:

cray-libsci_acc

With this module loaded, documentation can be viewed using the command man intro_libsci_acc.

Additionally a number of libraries are provided as part of the rocm module.

"},{"location":"user-guide/gpu/#python-environment","title":"Python Environment","text":"

The cray-python module can be used as normal for the GPU partition with mpi4py package that is installed by default. mpi4py uses cray-mpich under the hood and in the same way as the CPU compute nodes.

However unless specifically compiled for GPU-GPU communication certain python packages/frameworks that try to take advantage of the fast links between GPUs by calling MPI on GPU pointers may have issues. To set the environment correctly for a given python program the following snippet can be added to load the required libmpi_gtl_hsa library:

from os import environ\nif environ.get(\"MPICH_GPU_SUPPORT_ENABLED\", False):\n    from ctypes import CDLL, RTLD_GLOBAL\n    CDLL(f\"{environ.get('CRAY_MPICH_ROOTDIR')}/gtl/lib/libmpi_gtl_hsa.so\", mode=RTLD_GLOBAL)\n\nfrom mpi4py import MPI\n
"},{"location":"user-guide/gpu/#supported-software","title":"Supported software","text":"

The ARCHER2 GPU development platform is intended for code development, testing and experimentation and will not have supported centrally installed versions of codes as is the case for the standard ARCHER2 CPU compute nodes. However some builds are being made available to users by members of CSE to under a best effort approach to support the community.

Codes that have modules targeting GPUs are:

Note

Will be filled out as applications are compiled and made available.

"},{"location":"user-guide/gpu/#running-jobs-on-the-gpu-nodes","title":"Running jobs on the GPU nodes","text":"

To run a GPU job, you must specify a GPU partition and a quality of service (QoS) as well as the number of GPUs required. You specify the number of GPU cards you want per node using the --gpus=N option, where N is typically 1, 2 or 4.

Note

As there are 4 GPUs per node, each GPU is associated with 1/4 of the resources of the node, i.e., 8 of 32 physical cores and roughly 128 GiB of the total 512 GiB host memory.

Allocations of host resources are made pro-rata. For example, if 2 GPUs are requested, sbatch will allocate 16 cores and around 256 GiB of host memory (in addition to 2 GPUs). Any attempt to use more than the allocated resources will result in an error.

This automatic allocation by Slurm for GPU jobs means that the submission script should not specify options such as --ntasks and --cpus-per-task. Such a job submission will be rejected. See below for some examples of how to use host resources and how to launch MPI applications.

Warning

In order to run jobs on the GPU nodes your ARCHER2 budget must have positive CU hours associated with it. However, your budget will not be charged for any GPU jobs you run.

"},{"location":"user-guide/gpu/#slurm-partitions","title":"Slurm Partitions","text":"

Your job script must specify a partition. The following table has a list of relevant GPU partition(s) on ARCHER2.

Partition Description Max nodes available gpu GPU nodes with AMD EPYC 32-core processor, 512 GB memory, 4\u00d7AMD Instinct MI210 GPU 4"},{"location":"user-guide/gpu/#slurm-quality-of-service-qos","title":"Slurm Quality of Service (QoS)","text":"

Your job script must specify a QoS relevant for the GPU nodes. Available QoS specifications are as follows.

QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes gpu-shd 1 12 hr 2 1 gpu Nodes potentially shared with other users gpu-exc 2 12 hr 2 1 gpu Exclusive node access"},{"location":"user-guide/gpu/#example-job-submission-scripts","title":"Example job submission scripts","text":"

Here are a series of example jobs for various patterns of running on the ARCHER2 GPU nodes They cover the following scenarios:

"},{"location":"user-guide/gpu/#single-gpu","title":"Single GPU","text":"

This example requests a single GPU on a potentially shared node and launch using a single CPU process with offload to a single GPU.

#!/bin/bash\n\n#SBATCH --job-name=single-GPU\n#SBATCH --gpus=1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\nsrun --ntasks=1 --cpus-per-task=1 ./my_gpu_program.x\n
"},{"location":"user-guide/gpu/#multiple-gpu-on-a-single-node-shared-node-access-max-2-gpu","title":"Multiple GPU on a single node - shared node access (max. 2 GPU)","text":"

This example requests two GPUs on a potentially shared node and launch using two MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the two MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=2\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-shd\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks=2 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\nsrun --ntasks=2 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n
"},{"location":"user-guide/gpu/#multiple-gpu-on-a-single-node-exclusive-node-access-max-4-gpu","title":"Multiple GPU on a single node - exclusive node access (max. 4 GPU)","text":"

This example requests four GPUs on a single node and launches the program using four MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=4\n#SBATCH --nodes=1\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n\n# Check assigned GPU\nsrun --ntasks=1 rocm-smi\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\nsrun --ntasks=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n

Note

When you use the --qos=gpu-exc QoS you must also add the --exclusive flag and then specify the number of nodes you want with --nodes=1.

"},{"location":"user-guide/gpu/#multiple-gpu-on-multiple-nodes-exclusive-node-access-max-8-gpu","title":"Multiple GPU on multiple nodes - exclusive node access (max. 8 GPU)","text":"

This example requests eight GPUs across two nodes and launches the program using eight MPI processes (one per GPU) with one MPI process per CPU NUMA region.

We use the --cpus-per-task=8 option to srun to set the stride between the MPI processes to 8 physical cores. This places the MPI processes on separate NUMA regions to ensure they are associated with the correct GPU that is closest to them on the compute node architecture.

#!/bin/bash\n\n#SBATCH --job-name=multi-GPU\n#SBATCH --gpus=4\n#SBATCH --nodes=2\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n\n# Check assigned GPU\nnodelist=$(scontrol show hostname $SLURM_JOB_NODELIST)\nfor nodeid in $nodelist\ndo\n   echo $nodeid\n   srun --ntasks=1 --gpus=4 --nodes=1 --ntasks-per-node=1 --nodelist=$nodeid rocm-smi\ndone\n\n# Check process/thread pinning\nmodule load xthi\nsrun --ntasks-per-node=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     xthi\n\n# Enable GPU-aware MPI\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\nsrun --ntasks-per-node=4 --cpus-per-task=8 \\\n     --hint=nomultithread --distribution=block:block \\\n     ./my_gpu_program.x\n

Note

When you use the --qos=gpu-exc QoS you must also add the --exclusive flag and then specify the number of nodes you want with, for example, --nodes=2.

"},{"location":"user-guide/gpu/#interactive-jobs","title":"Interactive jobs","text":""},{"location":"user-guide/gpu/#using-salloc","title":"Using salloc","text":"

Tip

This method does not give you an interactive shell on a GPU compute node. If you want an interactive shell on the GPU compute nodes, see the srun method described below.

If you wish to have a terminal to perform interactive testing, you can use the salloc command to reserve the resources so you can use srun commands interactively. For example, to request 1 GPU for 20 minutes you would use (remember to replace t01 with your budget code):

auser@ln04:/work/t01/t01/auser> salloc --gpus=1 --time=00:20:00 --partition=gpu --qos=gpu-shd --account=t01\nsalloc: Pending job allocation 5335731\nsalloc: job 5335731 queued and waiting for resources\nsalloc: job 5335731 has been allocated resources\nsalloc: Granted job allocation 5335731\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid200001 are ready for job\n\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> srun rocm-smi\n\n\n======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    31.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n\n\nsrun: error: nid200001: tasks 0: Exited with exit code 2\nsrun: launch/slurm: _step_signal: Terminating StepId=5335731.0\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> srun --ntasks=1 --cpus-per-task=8 --hint=nomultithread xthi\nNode summary for    1 nodes:\nNode    0, hostname nid200001, mpi   1, omp   1, executable xthi\nMPI summary: 1 ranks\nNode    0, rank    0, thread   0, (affinity =  0-7)\n
"},{"location":"user-guide/gpu/#using-srun","title":"Using srun","text":"

If you want an interactive terminal on a GPU node then you can use the srun command to achieve this. For example, to request 1 GPU for 20 minutes with an interactive terminal on a GPU compute node you would use (remember to replace t01 with your budget code):

auser@ln04:/work/t01/t01/auser> srun --gpus=1 --time=00:20:00 --partition=gpu --qos=gpu-shd --account=z19 --pty /bin/bash\nsrun: job 5335771 queued and waiting for resources\nsrun: job 5335771 has been allocated resources\nauser@nid200001:/work/t01/t01/auser>\n

Note that the command prompt has changed to indicate we are now on a GPU compute node. You can now directly run commands that interact with the GPU devices, e.g.:

auser@nid200001:/work/t01/t01/auser> rocm-smi\n\n======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    29.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

Warning

Launching parallel jobs on GPU nodes from an interactive shell on a GPU node is not straightforward so you should either use job submission scripts or the salloc method of interactive use described above.

"},{"location":"user-guide/gpu/#environment-variables","title":"Environment variables","text":""},{"location":"user-guide/gpu/#rocr_visible_devices","title":"ROCR_VISIBLE_DEVICES","text":"

A list of device indices or UUIDs that will be exposed to applications

Runtime : ROCm Platform Runtime. Applies to all applications using the user mode ROCm software stack.

export ROCR_VISIBLE_DEVICES=\"0,GPU-DEADBEEFDEADBEEF\"

"},{"location":"user-guide/gpu/#hip-environment-variables","title":"HIP Environment variables","text":"

https://rocm.docs.amd.com/projects/HIP/en/docs-5.2.3/how_to_guides/debugging.html#summary-of-environment-variables-in-hip

"},{"location":"user-guide/gpu/#amd_log_level","title":"AMD_LOG_LEVEL","text":"

Enable HIP log on different Level.

export AMD_LOG_LEVEL=1

"},{"location":"user-guide/gpu/#amd_log_mask","title":"AMD_LOG_MASK","text":"

Enable HIP log on different Levels.

export AMD_LOG_MASK=0x1

Default: 0x7FFFFFFF\n\n0x1: Log API calls.\n0x02: Kernel and Copy Commands and Barriers.\n0x4: Synchronization and waiting for commands to finish.\n0x8: Enable log on information and below levels.\n0x20: Queue commands and queue contents.\n0x40: Signal creation, allocation, pool.\n0x80: Locks and thread-safety code.\n0x100: Copy debug.\n0x200: Detailed copy debug.\n0x400: Resource allocation, performance-impacting events.\n0x800: Initialization and shutdown.\n0x1000: Misc debug, not yet classified.\n0x2000: Show raw bytes of AQL packet.\n0x4000: Show code creation debug.\n0x8000: More detailed command info, including barrier commands.\n0x10000: Log message location.\n0xFFFFFFFF: Log always even mask flag is zero.\n
"},{"location":"user-guide/gpu/#hip_visible_devices","title":"HIP_VISIBLE_DEVICES:","text":"

For system with multiple devices, it\u2019s possible to make only certain device(s) visible to HIP via setting environment variable, HIP_VISIBLE_DEVICES(or CUDA_VISIBLE_DEVICES on Nvidia platform), only devices whose index is present in the sequence are visible to HIP.

Runtime : HIP Runtime. Applies only to applications using HIP on the AMD platform.

export HIP_VISIBLE_DEVICES=0,1

"},{"location":"user-guide/gpu/#amd_serialize_kernel","title":"AMD_SERIALIZE_KERNEL","text":"

To serialize the kernel enqueuing set the following variable,

export AMD_SERIALIZE_KERNEL=1

"},{"location":"user-guide/gpu/#amd_serialize_copy","title":"AMD_SERIALIZE_COPY","text":"

To serialize the copies set,

export AMD_SERIALIZE_COPY=1

"},{"location":"user-guide/gpu/#hip_host_coherent","title":"HIP_HOST_COHERENT","text":"

Sets whether memory in coherent in hipHostMalloc.

export HIP_HOST_COHERENT=1

If the value is 1, memory is coherent with host; if 0, memory is not coherent between host and GPU.

"},{"location":"user-guide/gpu/#openmp-environment-variables","title":"OpenMP Environment variables","text":"

https://rocm.docs.amd.com/en/docs-5.2.3/reference/openmp/openmp.html#environment-variables

"},{"location":"user-guide/gpu/#omp_default_device","title":"OMP_DEFAULT_DEVICE","text":"

Default device used for OpenMP target offloading.

Runtime : OpenMP Runtime. Applies only to applications using OpenMP offloading.

export OMP_DEFAULT_DEVICE=\"2\"

sets the default device to the 3rd device on the node.

"},{"location":"user-guide/gpu/#omp_num_teams","title":"OMP_NUM_TEAMS","text":"

Users can choose the number of teams used for kernel launch by setting,

export OMP_NUM_THREADS

this can be tuned to optimise performance.

"},{"location":"user-guide/gpu/#gpu_max_hw_queues","title":"GPU_MAX_HW_QUEUES","text":"

To set the number of HSA queues used in the OpenMP runtime set,

export GPU_MAX_HW_QUEUES

"},{"location":"user-guide/gpu/#mpi-environment-variables","title":"MPI Environment variables","text":""},{"location":"user-guide/gpu/#mpich_gpu_support_enabled","title":"MPICH_GPU_SUPPORT_ENABLED","text":"

Activates GPU aware MPI in Cray MPICH:

export MPICH_GPU_SUPPORT_ENABLED=1

If not set MPI calls that attempt to send messages from buffers that are on GPU-attached memory will crash/hang.

"},{"location":"user-guide/gpu/#hsa_enable_sdma","title":"HSA_ENABLE_SDMA","text":"

export HSA_ENABLE_SDMA=0

Forces host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.

Impact will be reduced bandwidth but this is recommended when isolating issues with hardware copy engines.

"},{"location":"user-guide/gpu/#mpich_ofi_nic_policy","title":"MPICH_OFI_NIC_POLICY","text":"

For GPU-enabled parallel applications that involve MPI operations that access application arrays that are resident on GPU-attached memory regions users can set,

export MPICH_OFI_NIC_POLICY=GPU

In this case, for each MPI process, Cray MPI aims to select a NIC device that is closest to the GPU device being used.

"},{"location":"user-guide/gpu/#mpich_ofi_nic_verbose","title":"MPICH_OFI_NIC_VERBOSE","text":"

To display information pertaining to NIC selection set,

export MPICH_OFI_NIC_VERBOSE=2

"},{"location":"user-guide/gpu/#debugging","title":"Debugging","text":"

Note

Work in progress

Documentation for rocgdb can be found in the following locations:

https://rocm.docs.amd.com/projects/ROCgdb/en/docs-5.2.3/index.html

https://docs.amd.com/projects/HIP/en/docs-5.2.3/how_to_guides/debugging.html#using-rocgdb

"},{"location":"user-guide/gpu/#profiling","title":"Profiling","text":"

An initial profiling capability is provided via rocprof which is part of the rocm module.

For example in an interactive session where resources have already been allocated you can call,

srun -n 2 --exclusive --nodes=1 --time=00:20:00 --partition=gpu --qos=gpu-exc --gpus=2 rocprof --stats ./myprog_exe\n

to profile your application. More detail on the use of rocprof can be found here.

"},{"location":"user-guide/gpu/#performance-tuning","title":"Performance tuning","text":"

AMD provides some documentation on performance tuning here not all options will be available to users, so be aware that mileage may vary.

"},{"location":"user-guide/gpu/#hardware-details","title":"Hardware details","text":"

The specifications of the GPU hardware can be found here.

Additionally you can use the command,

rocminfo

in job on a GPU node to print information about the GPUs and CPU on the node. This command is provided as part of the rocm module.

"},{"location":"user-guide/gpu/#node-topology","title":"Node Topology","text":"

Using rocm-smi --showtopo we can learn about the connections between the GPUs in a node and the how memory regions between the GPU and CPU are connected.

======================= ROCm System Management Interface =======================\n=========================== Weight between two GPUs ============================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            15           15           15\nGPU1   15           0            15           15\nGPU2   15           15           0            15\nGPU3   15           15           15           0\n\n============================ Hops between two GPUs =============================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            1            1            1\nGPU1   1            0            1            1\nGPU2   1            1            0            1\nGPU3   1            1            1            0\n\n========================== Link Type between two GPUs ==========================\n       GPU0         GPU1         GPU2         GPU3\nGPU0   0            XGMI         XGMI         XGMI\nGPU1   XGMI         0            XGMI         XGMI\nGPU2   XGMI         XGMI         0            XGMI\nGPU3   XGMI         XGMI         XGMI         0\n\n================================== Numa Nodes ==================================\nGPU 0          : (Topology) Numa Node: 0\nGPU 0          : (Topology) Numa Affinity: 0\nGPU 1          : (Topology) Numa Node: 1\nGPU 1          : (Topology) Numa Affinity: 1\nGPU 2          : (Topology) Numa Node: 2\nGPU 2          : (Topology) Numa Affinity: 2\nGPU 3          : (Topology) Numa Node: 3\nGPU 3          : (Topology) Numa Affinity: 3\n============================= End of ROCm SMI Log ==============================\n

To quote the rocm documentation:

- The first block of the output shows the distance between the GPUs similar to what the numactl command outputs for the NUMA domains of a system. The weight is a qualitative measure for the \u201cdistance\u201d data must travel to reach one GPU from another one. While the values do not carry a special (physical) meaning, the higher the value the more hops are needed to reach the destination from the source GPU.\n\n- The second block has a matrix named \u201cHops between two GPUs\u201d, where 1 means the two GPUs are directly connected with XGMI, 2 means both GPUs are linked to the same CPU socket and GPU communications will go through the CPU, and 3 means both GPUs are linked to different CPU sockets so communications will go through both CPU sockets. This number is one for all GPUs in this case since they are all connected to each other through the Infinity Fabric links.\n\n- The third block outputs the link types between the GPUs. This can either be \u201cXGMI\u201d for AMD Infinity Fabric links or \u201cPCIE\u201d for PCIe Gen4 links.\n\n- The fourth block reveals the localization of a GPU with respect to the NUMA organization of the shared memory of the AMD EPYC processors.\n
"},{"location":"user-guide/gpu/#rocm-bandwidth-test","title":"rocm-bandwidth-test","text":"

As part of the rocm module the rocm-bandwidth-test is provided that can be used to measure the performance of communications between the hardware in a node.

In addition to rocm-smi this is a bandwidth test that can be useful in understanding the composition and performance limitations in a GPU node. Here is an example output from a GPU nodes on ARCHER2.

Device: 0,  AMD EPYC 7543P 32-Core Processor\nDevice: 1,  AMD EPYC 7543P 32-Core Processor\nDevice: 2,  AMD EPYC 7543P 32-Core Processor\nDevice: 3,  AMD EPYC 7543P 32-Core Processor\nDevice: 4,  ,  GPU-ab43b63dec8adaf3,  c9:0.0\nDevice: 5,  ,  GPU-0b953cf8e6d4184a,  87:0.0\nDevice: 6,  ,  GPU-b0266df54d0dd2e1,  49:0.0\nDevice: 7,  ,  GPU-790a09bfbf673859,  09:0.0\n\nInter-Device Access\n\nD/D       0         1         2         3         4         5         6         7\n\n0         1         1         1         1         1         1         1         1\n\n1         1         1         1         1         1         1         1         1\n\n2         1         1         1         1         1         1         1         1\n\n3         1         1         1         1         1         1         1         1\n\n4         1         1         1         1         1         1         1         1\n\n5         1         1         1         1         1         1         1         1\n\n6         1         1         1         1         1         1         1         1\n\n7         1         1         1         1         1         1         1         1\n\n\nInter-Device Numa Distance\n\nD/D       0         1         2         3         4         5         6         7\n\n0         0         12        12        12        20        32        32        32\n\n1         12        0         12        12        32        20        32        32\n\n2         12        12        0         12        32        32        20        32\n\n3         12        12        12        0         32        32        32        20\n\n4         20        32        32        32        0         15        15        15\n\n5         32        20        32        32        15        0         15        15\n\n6         32        32        20        32        15        15        0         15\n\n7         32        32        32        20        15        15        15        0\n\n\nUnidirectional copy peak bandwidth GB/s\n\nD/D       0           1           2           3           4           5           6           7\n\n0         N/A         N/A         N/A         N/A         26.977      26.977      26.977      26.977\n\n1         N/A         N/A         N/A         N/A         26.977      26.975      26.975      26.975\n\n2         N/A         N/A         N/A         N/A         26.977      26.977      26.975      26.975\n\n3         N/A         N/A         N/A         N/A         26.975      26.977      26.975      26.977\n\n4         28.169      28.171      28.169      28.169      1033.080    42.239      42.112      42.264\n\n5         28.169      28.169      28.169      28.169      42.243      1033.088    42.294      42.286\n\n6         28.169      28.171      28.167      28.169      42.158      42.281      1043.367    42.277\n\n7         28.171      28.169      28.169      28.169      42.226      42.264      42.264      1051.212\n\n\nBidirectional copy peak bandwidth GB/s\n\nD/D       0           1           2           3           4           5           6           7\n\n0         N/A         N/A         N/A         N/A         40.480      42.528      42.059      42.173\n\n1         N/A         N/A         N/A         N/A         41.604      41.826      41.903      41.417\n\n2         N/A         N/A         N/A         N/A         41.008      41.499      41.258      41.338\n\n3         N/A         N/A         N/A         N/A         40.968      41.273      40.982      41.450\n\n4         40.480      41.604      41.008      40.968      N/A         80.946      80.631      80.888\n\n5         42.528      41.826      41.499      41.273      80.946      N/A         80.944      80.940\n\n6         42.059      41.903      41.258      40.982      80.631      80.944      N/A         80.896\n\n7         42.173      41.417      41.338      41.450      80.888      80.940      80.896      N/A\n
"},{"location":"user-guide/gpu/#tools","title":"Tools","text":""},{"location":"user-guide/gpu/#rocm-smi","title":"rocm-smi","text":"

If you load the rocm module on the system you will have access to the rocm-smi utility. This utility allows users to report information about the GPUs on node and can be very useful in better understanding the set up of the hardware you are working with and monitoring GPU metrics during job execution.

Here are some useful commands to get you started:

rocm-smi --alldevices device status

======================= ROCm System Management Interface =======================\n================================= Concise Info =================================\nGPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%\n0    28.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n1    30.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n2    33.0c  43.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n3    33.0c  41.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%\n================================================================================\n============================= End of ROCm SMI Log ==============================\n
This shows you the current state of the hardware while an application is running.

Focusing on the GPU activity can be useful to understand when your code is active on the GPUs:

rocm-smi --showuse GPU activity

======================= ROCm System Management Interface =======================\n============================== % time GPU is busy ==============================\nGPU[0]          : GPU use (%): 0\nGPU[0]          : GFX Activity: 705759841\nGPU[1]          : GPU use (%): 0\nGPU[1]          : GFX Activity: 664257322\nGPU[2]          : GPU use (%): 0\nGPU[2]          : GFX Activity: 660987914\nGPU[3]          : GPU use (%): 0\nGPU[3]          : GFX Activity: 665049119\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

Additionally you can focus on the memory use of the GPUs:

rocm-smi --showmemuse GPU memory currently consumed

======================= ROCm System Management Interface =======================\n============================== Current Memory Use ==============================\nGPU[0]          : GPU memory use (%): 0\nGPU[0]          : Memory Activity: 323631375\nGPU[1]          : GPU memory use (%): 0\nGPU[1]          : Memory Activity: 319196585\nGPU[2]          : GPU memory use (%): 0\nGPU[2]          : Memory Activity: 318641690\nGPU[3]          : GPU memory use (%): 0\nGPU[3]          : Memory Activity: 319854295\n================================================================================\n============================= End of ROCm SMI Log ==============================\n

More commands can be found by running,

rocm-smi --help

will run on the login nodes to get more information about probing the GPUs.

More detail can be found at here.

"},{"location":"user-guide/gpu/#hipify","title":"HIPIFY","text":"

HIPIFY is a CUDA to HIP source translator tool that can allow CUDA source code to be translated into HIP source code, easing the transition between the two hardware targets.

The tool is available on ARCHER2 by loading the rocm module.

The github repository for HIPIFY can be found here.

The documentation for HIPIFY is found here.

"},{"location":"user-guide/gpu/#notes-and-useful-links","title":"Notes and useful links","text":"

You should expect the software development environment to be similar to that available on the Frontier exascale system:

"},{"location":"user-guide/hardware/","title":"ARCHER2 hardware","text":"

Note

Some of the material in this section is closely based on information provided by NASA as part of the documentation for the Aitkin HPC system.

"},{"location":"user-guide/hardware/#system-overview","title":"System overview","text":"

ARCHER2 is a HPE Cray EX supercomputing system which has a total of 5,860 compute nodes. Each compute node has 128 cores (dual AMD EPYC 7742 64-core 2.25GHz processors) giving a total of 750,080 cores. Compute nodes are connected together by a HPE Slingshot interconnect.

There are additional User Access Nodes (UAN, also called login nodes), which provide access to the system, and data-analysis nodes, which are well-suited for preparation of job inputs and analysis of job outputs.

Compute nodes are only accessible via the Slurm job scheduling system.

There are two storage types: home and work. Home is available on login nodes and data-analysis nodes. Work is available on login, data-analysis nodes and compute nodes (see I/O and file systems).

This is shown in the ARCHER2 architecture diagram:

The home file system is provided by dual NetApp FAS8200A systems (one primary and one disaster recovery) with a capacity of 1 PB each.

The work file system consists of four separate HPE Cray L300 storage systems, each with a capacity of 3.6 PB. The interconnect uses a dragonfly topology, and has a bandwidth of 100 Gbps.

The system also includes 1.1 PB burst buffer NVMe storage, provided by an HPE Cray E1000.

"},{"location":"user-guide/hardware/#compute-node-overview","title":"Compute node overview","text":"

The compute nodes each have 128 cores. They are dual socket nodes with two 64-core AMD EPYC 7742 processors. There are 5,276 standard memory nodes and 584 high memory nodes.

Note

Note due to Simultaneous Multi-Threading (SMT) each core has 2 threads, therefore a node has 128 cores / 256 threads. Most users will not want to use SMT, see Launching parallel jobs.

Component Details Processor 2x AMD Zen2 (Rome) EPYC 7742, 64-core, 2.25 Ghz Cores per node 128 NUMA structure 8 NUMA regions per node (16 cores per NUMA region) Memory per node 256 GB (standard), 512 GB (high memory) Memory per core 2 GB (standard), 4 GB (high memory) L1 cache 32 kB/core L2 cache 512 kB/core L3 cache 16 MB/4-cores Vector support AVX2 Network connection 2x 100 Gb/s injection ports per node

Each socket contains eight Core Complex Dies (CCDs) and one I/O die (IOD). Each CCD contains two Core Complexes (CCXs). Each CCX has 4 cores and 16 MB of L3 cache. Thus, there are 64 cores per socket and 128 cores per node.

More information on the architecture of the AMD EPYC Zen2 processors:

"},{"location":"user-guide/hardware/#amd-zen2-microarchitecture","title":"AMD Zen2 microarchitecture","text":"

The AMD EPYC 7742 Rome processor has a base CPU clock of 2.25 GHz and a maximum boost clock of 3.4 GHz. There are eight processor dies (CCDs) with a total of 64 cores per socket.

Tip

The processors can only access their boost frequencies if the CPU frequency is set to 2.25 GHz. See the documentation on setting CPU frequency for information on how to select the correct CPU frequency.

Note

When all 128 compute cores on a node are loaded with computationally intensive work, we typically see the processor clock frequency boost to around 2.8 GHz.

Hybrid multi-die design:

Within each socket, the eight processor dies are fabricated on a 7 nanometer (nm) process, while the I/O die is fabricated on a 14 nm process. This design decision was made because the processor dies need the leading edge (and more expensive) 7 nm technology in order to reduce the amount of power and space needed to double the number of cores, and to add more cache, compared to the first-generation EPYC processors. The I/O die retains the less expensive, older 14 nm technology.

2nd-generation Infinity Fabric technology:

Infinity Fabric technology is used for communication among different components throughout the node: within cores, between cores, between core complexes (CCX) in a core complex die (CCD), among CCDs in a socket, to the main memory and PCIe, and between the two sockets. The Rome processors are the first x86 systems to support 4th-generation PCIe, which delivers twice the I/O performance (to the Slingshot interconnect, storage, NVMe SSD, etc.) compared to 3rd-generation PCIe.

"},{"location":"user-guide/hardware/#processor-hierarchy","title":"Processor hierarchy","text":"

The Zen2 processor hierarchy is as follows:

CPU core

AMD 7742 is a 64-bit x86 server microprocessor. A partial list of instructions and features supported in Rome includes SSE, SSE2, SSE3, SSSE3, SSE4a, SSE4.1, SSE4.2, AES, FMA, AVX, AVX2 (256 bit), Integrated x87 FPU (FPU), Multi-Precision Add-Carry (ADX), 16-bit Floating Point Conversion (F16C), and No-eXecute (NX). For a complete list, run cat /proc/cpuinfo on the ARCHER2 login nodes.

Each core:

"},{"location":"user-guide/hardware/#cache-hierarchy","title":"Cache hierarchy","text":"

The cache hierarchy is as follows:

Note

With the write-back policy, data is updated in the current level cache first. The update in the next level storage is done later when the cache line is ready to be replaced.

Note

If a core misses in its local L2 and also in the L3, the shadow tags are consulted. If the shadow tag indicates that the data resides in another L2 within the CCX, a cache-to-cache transfer is initiated. 1 x 256 bits/cycle load bandwidth to L2 of each core; 1 x 256 bits/cycle store bandwidth from L2 of each core; write-back policy; populated by L2 victims.

"},{"location":"user-guide/hardware/#intra-socket-interconnect","title":"Intra-socket interconnect","text":"

The Infinity Fabric, evolved from AMD's previous generation HyperTransport interconnect, is a software-defined, scalable, coherent, and high-performance fabric. It uses sensors embedded in each die to scale control (Scalable Control Fabric, or SCF) and data flow (Scalable Data Fabric, or SDF).

"},{"location":"user-guide/hardware/#inter-socket-interconnect","title":"Inter-socket interconnect","text":"

Two EPYC 7742 SoCs are interconnected via Socket to Socket Global Memory Interconnect (xGMI) links, part of the Infinity Fabric that connects all the components of the SoC together. On ARCHER2 compute nodes there are 3 xGMI links using a total of 48 PCIe lanes. With the xGMI link speed set at 16 GT/s, the theoretical throughput for each direction is 96 GB/s (3 links x 16 GT/s x 2 bytes/transfer) without factoring in the encoding for xGMI, since there is no publication from AMD available. However, the expected efficiencies are 66\u201375%, so the sustained bandwidth per direction will be 63.5\u201372 GB/s. xGMI Dynamic Link Width Management saves power during periods of low socket-to-socket data traffic by reducing the number of active xGMI lanes per link from 16 to 8.

"},{"location":"user-guide/hardware/#memory-subsystem","title":"Memory subsystem","text":"

The Zen 2 microarchitecture places eight unified memory controllers in the centralized I/O die. The memory channels can be split into one, two, or four Non-Uniform Memory Access (NUMA) Nodes per Socket (NPS1, NPS2, and NPS4). ARCHER2 compute nodes are configured as NPS4, which is the highest memory bandwidth configuration geared toward HPC applications.

With eight 3,200 MHz memory channels, an 8-byte read or write operation taking place per cycle per channel results in a maximum total memory bandwidth of 204.8 GB/s per socket.

Each memory channel can be connected with up to two Double Data Rate (DDR) fourth-generation Dual In-line Memory Modules (DIMMs). On ARCHER2 standard memory nodes, each channel is connected to a single 16 GB DDR4 registered DIMM (RDIMM) with error correcting code (ECC) support leading to 128 GB per socket and 256 GB per node. For the high memory nodes, each channel is connected to a single 32 GB DDR4 registered DIMM (RDIMM) with error correcting code (ECC) support leading to 256 GB per socket and 512 GB per node.

"},{"location":"user-guide/hardware/#interconnect-details","title":"Interconnect details","text":"

ARCHER2 has a HPE Slingshot interconnect with 200 Gb/s signalling per node. It uses a dragonfly topology:

"},{"location":"user-guide/hardware/#storage-details","title":"Storage details","text":"

Information on the ARCHER2 parallel Lustre file systems and how to get best performance is available in the IO section.

"},{"location":"user-guide/io/","title":"I/O performance and tuning","text":"

This section describes common IO patterns, best practice for I/O and how to get good performance on the ARCHER2 storage.

Information on the file systems, directory layouts, quotas, archiving and transferring data can be found in the Data management and transfer section.

The advice here is targeted at use of the parallel file systems available on the compute nodes on ARCHER2 (i.e. Not the home and RDFaaS file systems).

"},{"location":"user-guide/io/#common-io-patterns","title":"Common I/O patterns","text":"

There are number of I/O patterns that are frequently used in parallel applications:

"},{"location":"user-guide/io/#single-file-single-writer-serial-io","title":"Single file, single writer (Serial I/O)","text":"

A common approach is to funnel all the I/O through one controller process (e.g. rank 0 in an MPI program). Although this has the advantage of producing a single file, the fact that only one client is doing all the I/O means that it gains little benefit from the parallel file system. In practice this severely limits the I/O rates, e.g. when writing large files the speed is not likely to significantly exceed 1 GB/s.

"},{"location":"user-guide/io/#file-per-process-fpp","title":"File-per-process (FPP)","text":"

One of the first parallel strategies people use for I/O is for each parallel process to write to its own file. This is a simple scheme to implement and understand and can achieve high bandwidth as, with many I/O clients active at once, it benefits from the parallel Lustre filesystem. However, it has the distinct disadvantage that the data is spread across many different files and may therefore be very difficult to use for further analysis without a data reconstruction stage to recombine potentially thousands of small files.

In addition, having thousands of files open at once can overload the filesystem and lead to poor performance.

Tip

The ARCHER2 solid state file system can give very high performance when using this model of I/O

The ADIOS 2 I/O library uses an approach similar to file-per-process and so can achieve very good performance on modern parallel file systems.

"},{"location":"user-guide/io/#file-per-node-fpn","title":"File-per-node (FPN)","text":"

A simple way to reduce the sheer number of files is to write a file per node rather than a file per process; as ARCHER2 has 128 CPU-cores per node, this can reduce the number of files by more than a factor of 100 and should not significantly affect the I/O rates. However, it still produces multiple files which can be hard to work with in practice.

"},{"location":"user-guide/io/#single-file-multiple-writers-without-collective-operations","title":"Single file, multiple writers without collective operations","text":"

All aspects of data management are simpler if your parallel program produces a single file in the same format as a serial code, e.g. analysis or program restart are much more straightforward.

There are a number of ways to achieve this. For example, many processes can open the same file but access different parts by skipping some initial offset, although this is problematic when writing as locking may be needed to ensure consistency. Parallel I/O libraries such as MPI-IO, HDF5 and NetCDF allow for this form of access and will implement locking automatically.

The problem is that, with many clients all individually accessing the same file, there can be a lot of contention for file system resources, leading to poor I/O rates. When writing, file locking can effectively serialise the access and there is no benefit from the parallel filesystem.

"},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf","title":"Single Shared File with collective writes (SSF)","text":"

The problem with having many clients performing I/O at the same time is that the I/O library may have to restrict access to one client at a time by locking. However if I/O is done collectively, where the library knows that all clients are doing I/O at the same time, then reads and writes can be explicitly coordinated to avoid clashes and no locking is required.

It is only through collective I/O that the full bandwidth of the file system can be realised while accessing a single file. Whatever I/O library you are using, it is essential to use collective forms of the read and write calls to achieve good performance.

"},{"location":"user-guide/io/#achieving-efficient-io","title":"Achieving efficient I/O","text":"

This section provides information on getting the best performance out of the parallel /work file systems on ARCHER2 when writing data, particularly using parallel I/O patterns.

"},{"location":"user-guide/io/#lustre-technology","title":"Lustre technology","text":"

The ARCHER2 /work file systems use Lustre as a parallel file system technology. It has many disk units (called Object Storage Targets or OSTs), all under the control of a single Meta Data Server (MDS) so that it appears to the user as a single file system. The Lustre file system provides POSIX semantics (changes on one node are immediately visible on other nodes) and can support very high data rates for appropriate I/O patterns.

In order to achieve good performance on the ARCHER2 Lustre file systems, you need to make sure your IO is configured correctly for the type of I/O you want to do. In the following sections we describe how to do this.

"},{"location":"user-guide/io/#summary-achieving-best-io-performance","title":"Summary: achieving best I/O performance","text":"

The configuration you should use depends on the type of I/O you are performing. Here, we summarise the settings for two of the I/O patterns described above: File-Per-Process (FPP, including using ADIOS2) and Single Share File with collective writes (SSF).

Following sections describe the settings in more detail.

"},{"location":"user-guide/io/#file-per-process-fpp_1","title":"File-Per-Process (FPP)","text":""},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf_1","title":"Single Shared File with collective writes (SSF)","text":""},{"location":"user-guide/io/#summary-typical-io-performance-on-archer2","title":"Summary: typical I/O performance on ARCHER2","text":""},{"location":"user-guide/io/#file-per-process-fpp_2","title":"File-Per-Process (FPP)","text":"

We regularly run tests of FPP write performance on ARCHER2 `/work`` Lustre file systems using the benchio software in the following configuration:

Typical write performance:

"},{"location":"user-guide/io/#single-shared-file-with-collective-writes-ssf_2","title":"Single Shared File with collective writes (SSF)","text":"

We regularly run tests of FPP write performance on ARCHER2 `/work`` Lustre file systems using the benchio software in the following configuration:

Typical write performance:

"},{"location":"user-guide/io/#striping","title":"Striping","text":"

One of the main factors leading to the high performance of Lustre file systems is the ability to store data on multiple OSTs. For many small files, this is achieved by storing different files on different OSTs; large files must be striped across multiple OSTs to benefit from the parallel nature of Lustre.

When a file is striped it is split into chunks and stored across multiple OSTs in a round-robin fashion. Striping can improve the I/O performance because it increases the available bandwidth: multiple processes can read and write the same file simultaneously by accessing different OSTs. However striping can also increase the overhead. Choosing the right striping configuration is key to obtain high performance results.

Users have control of a number of striping settings on Lustre file systems. Although these parameters can be set on a per-file basis they are usually set on the directory where your output files will be written so that all output files inherit the same settings.

"},{"location":"user-guide/io/#default-configuration","title":"Default configuration","text":"

The /work file systems on ARCHER2 have the same default stripe settings:

These settings have been chosen to provide a good compromise for the wide variety of I/O patterns that are seen on the system but are unlikely to be optimal for any one particular scenario. The Lustre command to query the stripe settings for a directory (or file) is lfs getstripe. For example, to query the stripe settings of an already created directory resdir:

auser@ln03:~> lfs getstripe resdir/\nresdir\nstripe_count:   1 stripe_size:    1048576 stripe_offset:  -1\n
"},{"location":"user-guide/io/#setting-custom-striping-configurations","title":"Setting custom striping configurations","text":"

Users can set stripe settings for a directory (or file) using the lfs setstripe command. The options for lfs setstripe are:

For example, to set a stripe size of 4 MiB for the existing directory resdir, along with maximum striping count you would use:

auser@ln03:~> lfs setstripe -S 4m -c -1 resdir/\n
"},{"location":"user-guide/io/#environment-variables","title":"Environment variables","text":"

The following environment variables typically only have an impact for the case when you using Single Shared Files with collective communications. As mentioned above, it is very important to use collective calls when doing parallel I/O to a single shared file.

However, with the default settings, parallel I/O on multiple nodes can currently give poor performance. We recommend always setting these environment variables in your SLURM batch script when you are using the SSF I/O pattern:

export FI_OFI_RXM_SAR_LIMIT=64K\nexport MPICH_MPIIO_HINTS=\"*:cray_cb_write_lock_mode=2,*:cray_cb_nodes_multiplier=4\u201d\n
"},{"location":"user-guide/io/#mpi-transport-protocol","title":"MPI transport protocol","text":"

Setting the environment variables described above can improve the performance of MPI collectives when handling large amounts of data, which in turn can improve collective file I/O. An alternative is to use the non-default UCX implementation of the MPI library as an alternative to the default OFI version.

To switch library version see the Application Development Environment section of the User Guide.

Note

This will affect all your MPI calls, not just those related to I/O, so you should check the overall performance of your program before and after the switch. It is possible that other functions may run slower even if the I/O performance improves.

"},{"location":"user-guide/io/#io-profiling","title":"I/O profiling","text":"

If you are concerned about your I/O performance, you should quantify your transfer rates in terms of GB/s of data read or written to disk. Small files can achieve very high I/O rates due to data caching in Lustre. However, for large files you should be able to achieve a maximum of around 1 GB/s for an unstriped file, or up to 10 GB/s for a fully striped file (across all 12 OSTs).

Warning

You share /work with all other users so I/O rates can be very variable, especially if the machine is heavily loaded.

If your I/O rates are poor then you can get useful summary information about how the parallel libraries are performing by setting this variable in your Slurm script

export MPICH_MPIIO_STATS=1\n

Amongst other things, this will give you information on how many independent and collective I/O operations were issued. If you see a large number of independent operations compared to collectives, this indicates that you have inefficient I/O patterns and you should check that you are calling your parallel I/O library correctly.

Although this information comes from the MPI library, it is still useful for users of higher-level libraries such as HDF5 as they all call MPI-IO at the lowest level.

"},{"location":"user-guide/io/#tips-and-advice-for-io","title":"Tips and advice for I/O","text":""},{"location":"user-guide/io/#set-an-optimum-blocksize-when-untaring-data","title":"Set an optimum blocksize when untar'ing data","text":"

When you are expanding a large tar archive file to the Lustre file systems you should specify the -b 2048 option to ensure that tar writes out data in blocks of 1 MiB. This will improve the performance of your tar command and reduce the impact of writing the data to Lustre on other users.

"},{"location":"user-guide/machine-learning/","title":"Machine Learning","text":"

Two Machine Learning (ML) frameworks are supported on ARCHER2, PyTorch and TensorFlow.

For each framework, we'll show how to run a particular MLCommons HPC benchmark. We start with PyTorch.

"},{"location":"user-guide/machine-learning/#pytorch","title":"PyTorch","text":"

On ARCHER2, PyTorch is supported for use on both the CPU and GPU nodes.

We'll demonstrate the use of PyTorch with DeepCam, a deep learning climate segmentation benchmark. It involves training a neural network to recognise large-scale weather phenomena (e.g., tropical cyclones, atmospheric rivers) in the output generated by ensembles of weather simulations, see link below for more details.

Exascale Deep Learning for Climate Analytics

There are two DeepCam training datasets available on ARCHER2. A 62 GB mini dataset (/work/z19/shared/mlperf-hpc/deepcam/mini), and a much larger 8.9 TB dataset (/work/z19/shared/mlperf-hpc/deepcam/full).

"},{"location":"user-guide/machine-learning/#deepcam-on-gpu","title":"DeepCam on GPU","text":"

A binary install of PyTorch 1.13.1 suitable for ROCm 5.2.3 has been installed according to the instructions linked below.

https://github.com/hpc-uk/build-instructions/blob/main/pyenvs/pytorch/build_pytorch_1.13.1_archer2_gpu.md

This install can be accessed by loading the pytorch/1.13.1-gpu module.

As DeepCam is an MLPerf benchmark, you may wish to base a local python environment on pytorch/1.13.1-gpu so that you have the opportunity to install additional python packages that support MLPerf logging, as well as extra features pertinent to DeepCam (e.g., dynamic learning rates).

The following instructions show how to create such an environment.

#!/bin/bash\n\nmodule -q load pytorch/1.13.1-gpu\n\nPYTHON_TAG=python`echo ${CRAY_PYTHON_LEVEL} | cut -d. -f1-2`\n\nPRFX=${HOME/home/work}/pyenvs\nPYVENV_ROOT=${PRFX}/mlperf-pt-gpu\nPYVENV_SITEPKGS=${PYVENV_ROOT}/lib/${PYTHON_TAG}/site-packages\n\nmkdir -p ${PYVENV_ROOT}\ncd ${PYVENV_ROOT}\n\n\npython -m venv --system-site-packages ${PYVENV_ROOT}\n\nextend-venv-activate ${PYVENV_ROOT}\n\nsource ${PYVENV_ROOT}/bin/activate\n\n\nmkdir -p ${PYVENV_ROOT}/repos\ncd ${PYVENV_ROOT}/repos\n\ngit clone -b hpc-1.0-branch https://github.com/mlcommons/logging mlperf-logging\npython -m pip install -e mlperf-logging\n\nrm ${PYVENV_SITEPKGS}/mlperf-logging.egg-link\nmv ./mlperf-logging/mlperf_logging ${PYVENV_SITEPKGS}/\nmv ./mlperf-logging/mlperf_logging.egg-info ${PYVENV_SITEPKGS}/\n\npython -m pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git\n\ndeactivate\n

In order to run a DeepCam training job, you must first clone the MLCommons HPC github repo.

mkdir ${HOME/home/work}/tests\ncd ${HOME/home/work}/tests\n\ngit clone https://github.com/mlcommons/hpc.git mlperf-hpc\n\ncd ./mlperf-hpc/deepcam/src/deepCam\n

You are now ready to run the following DeepCam submission script via the sbatch command.

#!/bin/bash\n\n#SBATCH --job-name=deepcam\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu-exc\n#SBATCH --nodes=2\n#SBATCH --gpus=8\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n\n\nJOB_OUTPUT_PATH=./results/${SLURM_JOB_ID}\nmkdir -p ${JOB_OUTPUT_PATH}/logs\n\nsource ${HOME/home/work}/pyenvs/mlperf-pt-gpu/bin/activate\n\nexport OMP_NUM_THREADS=1\nexport HOME=${HOME/home/work}\n\nsrun --ntasks=8 --tasks-per-node=4 \\\n     --cpu-bind=verbose,map_cpu:0,8,16,24 --hint=nomultithread \\\n     python train.py \\\n         --run_tag test \\\n         --data_dir_prefix /work/z19/shared/mlperf-hpc/deepcam/mini \\\n         --output_dir ${JOB_OUTPUT_PATH} \\\n     --wireup_method nccl-slurm \\\n     --max_epochs 64 \\\n     --local_batch_size 1\n\nmv slurm-${SLURM_JOB_ID}.out ${JOB_OUTPUT_PATH}/slurm.out\n

The job submission script activates the python environment that was setup earlier, but that particular command (source ${HOME/home/work}/pyenvs/mlperf-pt-gpu/bin/activate) could be replaced by module -q load pytorch/1.13.1-gpu if you are not running DeepCam and have no need for additional Python packages such as mlperf-logging and warmup-scheduler.

In the script above, we specify four tasks per node, one for each GPU. These tasks are evenly spaced across the node so as to maximise the communications bandwidth between the host and the GPU devices. Note, PyTorch is not using Cray MPICH for inter-task communications, which is instead being handled by the ROCm Collective Communications Library (RCCL), hence the --wireup_method nccl-slurm option (nccl-slurm works as an alias for `rccl-slurm in this context).

The above job should achieve convergence \u2014 an Intersection over Union (IoU) of 0.82 \u2014 after 35 epochs or so. Runtime should be around 20-30 minutes.

We can also modify the DeepCam train.py script so that the accuracy and loss are logged using TensorBoard.

The following lines must be added to the DeepCam train.py script.

import os\n...\n\nfrom torch.utils.tensorboard import SummaryWriter\n\n...\n\ndef main(pargs):\n\n    #init distributed training\n    comm_local_group = comm.init(pargs.wireup_method, pargs.batchnorm_group_size)\n    comm_rank = comm.get_rank()\n    ...\n\n    #set up logging\n    pargs.logging_frequency = max([pargs.logging_frequency, 0])\n    log_file = os.path.normpath(os.path.join(pargs.output_dir, \"logs\", pargs.run_tag + \".log\"))\n    ...\n\n    writer = SummaryWriter()\n\n    #set seed\n    ...\n\n    ...\n\n    #training loop\n    while True:\n        ...\n\n        #training\n        step = train_epoch(pargs, comm_rank, comm_size,\n                           ...\n                           logger, writer)\n\n        ...\n

The train_epoch function is defined in ./driver/trainer.py and so that file must be amended like so.

...\n\ndef train_epoch(pargs, comm_rank, comm_size,\n                ...,\n                logger, writer):\n\n    ...\n\n    writer.add_scalar(\"Accuracy/train\", iou_avg_train, epoch+1)\n    writer.add_scalar(\"Loss/train\", loss_avg_train, epoch+1)\n\n    return step\n
"},{"location":"user-guide/machine-learning/#deepcam-on-cpu","title":"DeepCam on CPU","text":"

PyTorch can also be run on the ARCHER2 CPU nodes. However, since the DeepCam uses the torch.distributed module, we cannot use Horovod to handle (via MPI) inter-task communications. We must instead build PyTorch from source so that we can link torch.distributed to the correct Cray MPICH libraries.

The instructions for doing such a build can be found here, https://github.com/hpc-uk/build-instructions/blob/main/pyenvs/pytorch/build_pytorch_1.13.0a0_from_source_archer2_cpu.md.

This install can be accessed by loading the pytorch/1.13.0a0 module. Please note, PyTorch source version 1.13.0a0 corresponds to PyTorch package version 1.13.1.

Once again, as we are running the DeepCam benchmark, we'll need to setup a local Python environment for installing the MLPerf logging package. This time the local environment is based on the pytorch/1.13.0a0 module.

#!/bin/bash\n\nmodule -q load pytorch/1.13.0a0\n\nPYTHON_TAG=python`echo ${CRAY_PYTHON_LEVEL} | cut -d. -f1-2`\n\nPRFX=${HOME/home/work}/pyenvs\nPYVENV_ROOT=${PRFX}/mlperf-pt\nPYVENV_SITEPKGS=${PYVENV_ROOT}/lib/${PYTHON_TAG}/site-packages\n\nmkdir -p ${PYVENV_ROOT}\ncd ${PYVENV_ROOT}\n\n\npython -m venv --system-site-packages ${PYVENV_ROOT}\n\nextend-venv-activate ${PYVENV_ROOT}\n\nsource ${PYVENV_ROOT}/bin/activate\n\n\nmkdir -p ${PYVENV_ROOT}/repos\ncd ${PYVENV_ROOT}/repos\n\ngit clone -b hpc-1.0-branch https://github.com/mlcommons/logging mlperf-logging\npython -m pip install -e mlperf-logging\n\nrm ${PYVENV_SITEPKGS}/mlperf-logging.egg-link\nmv ./mlperf-logging/mlperf_logging ${PYVENV_SITEPKGS}/\nmv ./mlperf-logging/mlperf_logging.egg-info ${PYVENV_SITEPKGS}/\n\npython -m pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git\n\ndeactivate\n

DeepCam can now be run on the CPU nodes using a submission script like the one below.

#!/bin/bash\n\n#SBATCH --job-name=deepcam\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --nodes=32\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=128\n#SBATCH --time=10:00:00\n#SBATCH --exclusive\n\n\nJOB_OUTPUT_PATH=./results/${SLURM_JOB_ID}\nmkdir -p ${JOB_OUTPUT_PATH}/logs\n\nsource ${HOME/home/work}/pyenvs/mlperf-pt/bin/activate\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\nexport OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}\n\nsrun --hint=nomultithread \\\n     python train.py \\\n         --run_tag test \\\n         --data_dir_prefix /work/z19/shared/mlperf-hpc/deepcam/mini \\\n         --output_dir ${JOB_OUTPUT_PATH} \\\n         --wireup_method mpi \\\n         --max_inter_threads ${SLURM_CPUS_PER_TASK} \\\n         --max_epochs 64 \\\n         --local_batch_size 1\n\nmv slurm-${SLURM_JOB_ID}.out ${JOB_OUTPUT_PATH}/slurm.out\n

The script above activates the local Python environment so that the mlperf-logging package is available; this is needed by the logger object declared in the DeepCam train.py script. Notice also that the --wireup-method parameter is now set to mpi and that a new parameter has been added, --max_inter_threads, for specifying the maximum number of concurrent readers.

DeepCam performance on the CPU nodes is much slower than GPU. Running on 32 CPU nodes, as shown above, will take around 6 hours to complete 35 epochs. This assumes you're using the default hyperparameter settings for DeepCam.

"},{"location":"user-guide/machine-learning/#tensorflow","title":"TensorFlow","text":"

On ARCHER2, TensorFlow is supported for use on the CPU nodes only.

We'll demonstrate the use of TensorFlow with the CosmoFlow benchmark. It involves training a neural network to recognise cosmological parameter values from the output generated by 3D dark matter simulations, see link below for more details.

CosmoFlow: using deep learning to learn the universe at scale

There are two CosmoFlow training datasets available on ARCHER2. A 5.6 GB mini dataset (/work/z19/shared/mlperf-hpc/cosmoflow/mini), and a much larger 1.7 TB dataset (/work/z19/shared/mlperf-hpc/cosmoflow/full).

"},{"location":"user-guide/machine-learning/#cosmoflow-on-cpu","title":"CosmoFlow on CPU","text":"

In order to run a CosmoFlow training job, you must first clone the MLCommons HPC github repo.

mkdir ${HOME/home/work}/tests\ncd ${HOME/home/work}/tests\n\ngit clone https://github.com/mlcommons/hpc.git mlperf-hpc\n\ncd ./mlperf-hpc/cosmoflow\n

You are now ready to run the following CosmoFlow submission script via the sbatch command.

#!/bin/bash\n\n#SBATCH --job-name=cosmoflow\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --nodes=32\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n\nmodule -q load tensorflow/2.13.0\n\nexport UCX_MEMTYPE_CACHE=n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\nexport MPICH_DPM_DIR=${SLURM_SUBMIT_DIR}/dpmdir\n\nexport OMP_NUM_THREADS=16\nexport TF_ENABLE_ONEDNN_OPTS=1\n\nsrun  --hint=nomultithread --distribution=block:block --cpu-freq=2250000 \\\n    python train.py \\\n        --distributed --omp-num-threads ${OMP_NUM_THREADS} \\\n        --inter-threads 0 --intra-threads 0 \\\n        --n-epochs 2048 --n-train 1024 --n-valid 1024 \\\n        --data-dir /work/z19/shared/mlperf-hpc/cosmoflow/mini/cosmoUniverse_2019_05_4parE_tf_v2_mini\n

The CosmoFlow job runs eight MPI tasks per node (one per NUMA region) with sixteen threads per task, and so, each node is fully populated. The TF_ENABLE_ONEDNN_OPTS variable refers to Intel's oneAPI Deep Neural Network library. Within the TensorFlow source there are #ifdef guards that are activated when oneDNN is enabled. It turns out that having TF_ENABLE_ONEDNN_OPTS=1 also improves performance (by a factor of 12) on AMD processors.

The inter/intra thread training parameters allow one to exploit any parallelism implied by the TensorFlow (TF) DNN graph. For example, if a node in the TF graph can be parallelised, the number of threads assigned will be the value of --intra-threads; and, if there are separate nodes in the TF graph that can be run concurrently, the available thread count for such an activity is the value of --inter-threads. Of course, the optimum values for these parameters will depend on the DNN graph. The job script above tells TensorFlow to choose the values by setting both parameters to zero.

You will note that only a few hyperparameters are specified for the CosmoFlow training job (e.g., --n-epochs, --n-train and --n-valid). Those settings in fact override the values assigned to those same parameters within the ./configs/cosmo.yaml file. However, that file contains settings for many other hyperparameters that are not overwritten.

The CosmoFlow job specified above should take around 140 minutes to complete 2048 epochs, which should be sufficient to achieve a mean average error of 0.23.

"},{"location":"user-guide/profile/","title":"Profiling","text":"

There are a number of different ways to access profiling data on ARCHER2. In this section, we discuss the HPE Cray profiling tools, CrayPat-lite and CrayPat. We also show how to get usage data on currently running jobs from Slurm batch system.

You can also use the Linaro Forge tool to profile applications on ARCHER2.

If you are specifically interested in profiling IO, then you may want to look at the Darshan IO profiling tool.

"},{"location":"user-guide/profile/#craypat-lite","title":"CrayPat-lite","text":"

CrayPat-lite is a simplified and easy-to-use version of the Cray Performance Measurement and Analysis Tool (CrayPat). CrayPat-lite provides basic performance analysis information automatically, with a minimum of user interaction, and yet offers information useful to users wishing to explore a program's behaviour further using the full CrayPat suite.

"},{"location":"user-guide/profile/#how-to-use-craypat-lite","title":"How to use CrayPat-lite","text":"
  1. Ensure the perftools-base module is loaded.

    module list

  2. Load the perftools-lite module.

    module load perftools-lite

  3. Compile your application normally. An informational message from CrayPat-lite will appear indicating that the executable has been instrumented.

    cc -h std=c99  -o myapplication.x myapplication.c\n
    INFO: creating the CrayPat-instrumented executable 'myapplication.x' (lite-samples) ...OK  \n
  4. Run the generated executable normally by submitting a job.

    #!/bin/bash\n\n#SBATCH --job-name=CrayPat_test\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=00:20:00\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=1\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel program\nsrun --hint=nomultithread --distribution=block:block mpi_test.x\n
  5. Analyse the data.

    After the job finishes executing, CrayPat-lite output should be printed to stdout (i.e. at the end of the job's output file). A new directory will also be created containing .rpt and .ap2 files. The .rpt files are text files that contain the same information printed in the job's output file and the .ap2 files can be used to obtain more detailed information, which can be visualized using the Cray Apprentice2 tool.

"},{"location":"user-guide/profile/#further-help","title":"Further help","text":""},{"location":"user-guide/profile/#craypat","title":"CrayPat","text":"

The Cray Performance Analysis Tool (CrayPat) is a powerful framework for analysing a parallel application\u2019s performance on Cray supercomputers. It can provide very detailed information about the timing and performance of individual application procedures.

CrayPat can perform two types of performance analysis, sampling experiments and tracing experiments. A sampling experiment probes the code at a predefined interval and produces a report based on the data collected. A tracing experiment explicitly monitors the code performance within named routines. Typically, the overhead associated with a tracing experiment is higher than that associated with a sampling experiment but provides much more detailed information. The key to getting useful data out of a sampling experiment is to run your profiling for a representative length of time.

"},{"location":"user-guide/profile/#sampling-analysis","title":"Sampling analysis","text":"
  1. Ensure the perftools-base module is loaded.

    module list

  2. Load perftools module.

    module load perftools

  3. Compile your code in the standard way always using the Cray compiler wrappers (ftn, cc and CC). Object files need to be made available to CrayPat to correctly build an instrumented executable for profiling or tracing, this means that the compile and link stage should be separated by using the -c compile flag.

    auser@ln01:/work/t01/t01/auser> cc -h std=c99 -c jacobi.c\nauser@ln01:/work/t01/t01/auser> cc jacobi.o -o jacobi\n
  4. To instrument the binary, run the pat_build command. This will generate a new binary with +pat appended to the end (e.g. jacobi+pat).

    auser@ln:/work/t01/t01/auser> pat_build jacobi

  5. Run the new executable with +pat appended as you would with the regular executable. Each run will produce its own 'experiment directory' containing the performance data as .xf files inside a subdirectory called xf-files (e.g. running the jacobi+pat instrumented executable might produce jacobi+pat+12265-1573s/xf-files).

  6. Generate report data with pat_report.

The .xf files contain the raw sampling data from the run and need to be post-processed to produce useful results. This is done using the pat_report tool which converts all the raw data into a summarised and readable form. You should provide the name of the experiment directory as the argument to pat_report.

auser@ln:/work/t01/t01/auser> pat_report jacobi+pat+12265-1573s\n\nTable 1:  Profile by Function (limited entries shown)\n\nSamp% |  Samp |  Imb. |  Imb. | Group\n        |       |  Samp | Samp% |  Function\n        |       |       |       |   PE=HIDE\n100.0% | 849.5 |    -- |    -- | Total\n|--------------------------------------------------\n|  56.7% | 481.4 |    -- |    -- | MPI\n||-------------------------------------------------\n||  48.7% | 414.1 |  50.9 | 11.0% | MPI_Allreduce\n||   4.4% |  37.5 | 118.5 | 76.6% | MPI_Waitall\n||   3.0% |  25.2 |  44.8 | 64.5% | MPI_Isend\n||=================================================\n|  29.9% | 253.9 |  55.1 | 18.0% | USER\n||-------------------------------------------------\n||  29.9% | 253.9 |  55.1 | 18.0% | main\n||=================================================\n|  13.4% | 114.1 |    -- |    -- | ETC\n||-------------------------------------------------\n||  13.4% | 113.9 |  26.1 | 18.8% | __cray_memcpy_SNB\n|==================================================\n

This report will generate more files with the extension .ap2 in the experiment directory. These hold the same data as the .xf files but in the post-processed form. Another file produced has an .apa extension and is a text file with a suggested configuration for generating a traced experiment.

The .ap2 files generated are used to view performance data graphically with the Cray Apprentice2 tool.

The pat_report command is able to produce many different profile reports from the profiling data. You can select a predefined report with the -O flag to pat_report. A selection of the most generally useful predefined report types are:= listed below.

Example output:

auser@ln01:/work/t01/t01/auser> pat_report -O ca+src,load_balance  jacobi+pat+12265-1573s\n\nTable 1:  Profile by Function and Callers, with Line Numbers (limited entries shown)\n\nSamp% |  Samp |  Imb. |  Imb. | Group\n        |       |  Samp | Samp% |  Function\n        |       |       |       |   PE=HIDE\n100.0% | 849.5 |    -- |    -- | Total\n|--------------------------------------------------\n|--------------------------------------\n|  56.7% | 481.4 | MPI\n||-------------------------------------\n||  48.7% | 414.1 | MPI_Allreduce\n3|        |       |  main:jacobi.c:line.80\n||   4.4% |  37.5 | MPI_Waitall\n3|        |       |  main:jacobi.c:line.73\n||   3.0% |  25.2 | MPI_Isend\n|||------------------------------------\n3||   1.6% |  13.2 | main:jacobi.c:line.65\n3||   1.4% |  12.0 | main:jacobi.c:line.69\n||=====================================\n|  29.9% | 253.9 | USER\n||-------------------------------------\n||  29.9% | 253.9 | main\n|||------------------------------------\n3||  18.7% | 159.0 | main:jacobi.c:line.76\n3||   9.1% |  76.9 | main:jacobi.c:line.84\n|||====================================\n||=====================================\n|  13.4% | 114.1 | ETC\n||-------------------------------------\n||  13.4% | 113.9 | __cray_memcpy_SNB\n3|        |       |  __cray_memcpy_SNB\n|======================================\n
"},{"location":"user-guide/profile/#tracing-analysis","title":"Tracing analysis","text":""},{"location":"user-guide/profile/#automatic-program-analysis-apa","title":"Automatic Program Analysis (APA)","text":"

We can produce a focused tracing experiment based on the results from the sampling experiment using pat_build with the .apa file produced during the sampling.

auser@ln01:/work/t01/t01/auser> pat_build -O jacobi+pat+12265-1573s/build-options.apa\n

This will produce a third binary with extension +apa. This binary should once again be run on the compute nodes and the name of the executable changed to jacobi+apa. As with the sampling analysis, a report can be produced using pat_report. For example:

auser@ln01:/work/t01/t01/auser> pat_report jacobi+apa+13955-1573t\n\nTable 1:  Profile by Function Group and Function (limited entries shown)\n\nTime% |      Time |     Imb. |  Imb. |       Calls | Group\n        |           |     Time | Time% |             |  Function\n        |           |          |       |             |   PE=HIDE\n\n100.0% | 12.987762 |       -- |    -- | 1,387,544.9 | Total\n|-------------------------------------------------------------------------\n|  44.9% |  5.831320 |       -- |    -- |         2.0 | USER\n||------------------------------------------------------------------------\n||  44.9% |  5.831229 | 0.398671 |  6.4% |         1.0 | main\n||========================================================================\n|  29.2% |  3.789904 |       -- |    -- |   199,111.0 | MPI_SYNC\n||------------------------------------------------------------------------\n||  29.2% |  3.789115 | 1.792050 | 47.3% |   199,109.0 | MPI_Allreduce(sync)\n||========================================================================\n|  25.9% |  3.366537 |       -- |    -- | 1,188,431.9 | MPI\n||------------------------------------------------------------------------\n||  18.0% |  2.334765 | 0.164646 |  6.6% |   199,109.0 | MPI_Allreduce\n||   3.7% |  0.486714 | 0.882654 | 65.0% |   199,108.0 | MPI_Waitall\n||   3.3% |  0.428731 | 0.557342 | 57.0% |   395,104.9 | MPI_Isend\n|=========================================================================\n
"},{"location":"user-guide/profile/#manual-program-analysis","title":"Manual Program Analysis","text":"

CrayPat allows you to manually choose your profiling preference. This is particularly useful if the APA mode does not meet your tracing analysis requirements.

The entire program can be traced as a whole using -w:

auser@ln01:/work/t01/t01/auser> pat_build -w jacobi\n

Using -g, a program can be instrumented to trace all function entry point references belonging to the trace function group (mpi, libsci, lapack, scalapack, heap, etc):

auser@ln01:/work/t01/t01/auser> pat_build -w -g mpi jacobi\n
"},{"location":"user-guide/profile/#dynamically-linked-binaries","title":"Dynamically-linked binaries","text":"

CrayPat allows you to profile un-instrumented, dynamically linked binaries with the pat_run utility. pat_run delivers profiling information for codes that cannot easily be rebuilt. To use pat_run:

  1. Load the perftools-base module if it is not already loaded.

    module load perftools-base

  2. Run your application normally including the pat_run command right after your srun options.

    srun [srun-options] pat_run [pat_run-options] program [program-options]

  3. Use pat_report to examine any data collected during the execution of your application.

    auser@ln01:/work/t01/t01/auser> pat_report jacobi+pat+12265-1573s

Some useful pat_run options are as follows.

"},{"location":"user-guide/profile/#further-help_1","title":"Further help","text":""},{"location":"user-guide/profile/#cray-apprentice2","title":"Cray Apprentice2","text":"

Cray Apprentice2 is an optional GUI tool that is used to visualize and manipulate the performance analysis data captured during program execution. Cray Apprentice2 can display a wide variety of reports and graphs, depending on the type of program being analyzed, the way in which the program was instrumented for data capture, and the data that was collected during program execution.

You will need to use CrayPat to first instrument your program and capture performance analysis data, and then pat_report to generate the .ap2 files from the results. You may then use Cray Apprentice2 to visualize and explore those files.

The number and appearance of the reports that can be generated using Cray Apprentice2 is determined by the kind and quantity of data captured during program execution, which in turn is determined by the way in which the program was instrumented and the environment variables in effect at the time of program execution. For example, changing the PAT_RT_SUMMARY environment variable to 0 before executing the instrumented program nearly doubles the number of reports available when analyzing the resulting data in Cray Apprentice2.

export PAT_RT_SUMMARY=0\n

To use Cray Apprentice2 (app2), load perftools-base module if it is not already loaded.

module load perftools-base\n

Next, open the experiment directory generated during the instrumentation phase with Apprentice2.

auser@ln01:/work/t01/t01/auser> app2 jacobi+pat+12265-1573s\n
"},{"location":"user-guide/profile/#hardware-performance-counters","title":"Hardware Performance Counters","text":"

Hardware performance counters can be used to monitor CPU and power events on ARCHER2 compute nodes. The monitoring and reporting of hardware counter events is integrated with CrayPat - users should use CrayPat as described earlier in this section to run profiling experiments to gather data from hardware counter events and to analyse the data.

"},{"location":"user-guide/profile/#counters-and-counter-groups-available","title":"Counters and counter groups available","text":"

You can explore which event counters are available on compute nodes by running the following commands (replace t01 with a valid budget code for your account):

module load perftools\nsrun --ntasks=1 --partition=standard --qos=short --account=t01 papi_avail\n

For convenience, the CrayPat tool provides predetermined groups of hardware event counters. You can get more information on the hardware event counters available through CrayPat with the following commands (on a login or compute node):

module load perftools\npat_help counters rome groups\n

If you want information on which hardware event counters are included in a group you can type the group name at the prompt you get after running the command above. Once you have finished browsing the help, type . to quit back to the command line.

"},{"location":"user-guide/profile/#powerenergy-counters-available","title":"Power/energy counters available","text":"

You can also access counters on power/energy consumption. To list the counters available to monitor power/energy use you can use the command (replace t01 with a valid budget code for your account):

module load perftools\nsrun --ntasks=1 --partition=standard --qos=short --account=t01 papi_native_avail -i cray_pm\n
"},{"location":"user-guide/profile/#enabling-hardware-counter-data-collection","title":"Enabling hardware counter data collection","text":"

You enable the collection of hardware event counter data as part of a CrayPat experiment by setting the environment variable PAT_RT_PERFCTR to a comma separated list of the groups/counters that you wish to measure.

For example, you could set (usually in your job submission script):

export PAT_RT_PERFCTR=1\n

to use the 1 counter group (summary with branch activity).

"},{"location":"user-guide/profile/#analysing-hardware-counter-data","title":"Analysing hardware counter data","text":"

If you enabled collection of hardware event counters when running your profiling experiment, you will automatically get a report on the data when you use the pat_report command to analyse the profile experiment data file.

You will see information similar to the following in the output from CrayPat for different sections of your code (this example if for the case where export PAT_RT_PERFCTR=1, counter group: summary with branch activity, was set in the job submission script):

==============================================================================\n  USER / main\n------------------------------------------------------------------------------\n  Time%                                                   88.3% \n  Time                                               446.113787 secs\n  Imb. Time                                           33.094417 secs\n  Imb. Time%                                               6.9% \n  Calls                       0.002 /sec                    1.0 calls\n  PAPI_BR_TKN                 0.240G/sec    106,855,535,005.863 branch\n  PAPI_TOT_INS                5.679G/sec  2,533,386,435,314.367 instr\n  PAPI_BR_INS                 0.509G/sec    227,125,246,394.008 branch\n  PAPI_TOT_CYC                            1,243,344,265,012.828 cycles\n  Instr per cycle                                          2.04 inst/cycle\n  MIPS                 1,453,770.20M/sec                        \n  Average Time per Call                              446.113787 secs\n  CrayPat Overhead : Time      0.2%           \n
"},{"location":"user-guide/profile/#using-the-craypat-api-to-gather-hardware-counter-data","title":"Using the CrayPAT API to gather hardware counter data","text":"

The CrayPAT API features a particular function, PAT_counters, that allows you to obtain the values of specific hardware counters at specific points within your code.

For convenience, we have developed an MPI-based wrapper for this aspect of the CrayPAT API, called pat_mpi_lib, which can be found via the link below.

https://github.com/cresta-eu/pat_mpi_lib

The PAT MPI Library makes it possible to monitor a user-defined set of hardware performance counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just four functions, and is intended to be straightforward to use. Once you've defined the hooks in your code for recording counter values, you can control which counters are read at runtime by setting the PAT_RT_PERFCTR environment variable in the job submission script. As your code executes, the defined set of counters will be read at various points. After each reading, the counter values are summed by rank 0 (via an MPI reduction) before being output to a log file.

Further information along with test harnesses and example scripts can be found by reading the PAT MPI Library readme file.

"},{"location":"user-guide/profile/#more-information-on-hardware-counters","title":"More information on hardware counters","text":"

More information on using hardware counters can be found in the appropriate section of the HPE documentation:

Also available are two MPI-based wrapper libraries, one for Power Management (PM) counters that cover such properties as point-in-time power, cumulative energy use and temperature; and one that provides access to PAPI counters. See the links below for further details.

"},{"location":"user-guide/profile/#performance-and-profiling-data-in-slurm","title":"Performance and profiling data in Slurm","text":"

Slurm commands on the login nodes can be used to quickly and simply retrieve information about memory usage for currently running and completed jobs.

There are three commands you can use on ARCHER2 to query job data from Slurm, two are standard Slurm commands and one is a script that provides information on running jobs:

We provide examples of the use of these three commands below.

For the sacct and sstat command, the memory properties we print out below are:

Tip

Slurm polls for the memory use in a job, this means that short-term changes in memory use may not be captured in the Slurm data.

"},{"location":"user-guide/profile/#example-1-sstat-for-running-jobs","title":"Example 1: sstat for running jobs","text":"

To display the current memory use of a running job with the ID 123456:

sstat --format=JobID,AveCPU,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot%150 -j 123456\n
"},{"location":"user-guide/profile/#example-2-sacct-for-finished-jobs","title":"Example 2: sacct for finished jobs","text":"

To display the memory use of a completed job with the ID 123456:

sacct --format=JobID,JobName,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot%150 -j 123456\n

Another usage of sacct is to display when a job was submitted, started running and ended for a particular user:

sacct --format=JobID,Submit,Start,End -u auser\n
"},{"location":"user-guide/profile/#example-3-archer2jobload-for-running-jobs","title":"Example 3: archer2jobload for running jobs","text":"

Using the archer2jobload command on its own with no options will show the current CPU and memory use across compute nodes for all running jobs.

More usefully, you can provide a job ID to archer2jobload and it will show a summary of the CPU and memory use for a specific job. For example, to get the usage data for job 123456, you would use:

auser@ln01:~> archer2jobload 123456\n# JOB: 123456\nCPU_LOAD            MEMORY              ALLOCMEM            FREE_MEM            TMP_DISK            NODELIST            \n127.35-127.86       256000              239872              169686-208172       0                   nid[001481,001638-00\n

This shows the minimum CPU load on a compute node is 126.04 (close to the limit of 128 cores) with the maximum load 127.41 (indicating all the nodes are being used evenly). The minimum free memory is 171893 MB and the maximum free memory is 177224 MB.

If you add the -l option, you will see a breakdown per node:

auser@ln01:~> archer2jobload -l 276236\n# JOB: 123456\nNODELIST            CPU_LOAD            MEMORY              ALLOCMEM            FREE_MEM            TMP_DISK            \nnid001481           127.86              256000              239872              169686              0                   \nnid001638           127.60              256000              239872              171060              0                   \nnid001639           127.64              256000              239872              171253              0                   \nnid001677           127.85              256000              239872              173820              0                   \nnid001678           127.75              256000              239872              173170              0                   \nnid001891           127.63              256000              239872              173316              0                   \nnid001921           127.65              256000              239872              207562              0                   \nnid001922           127.35              256000              239872              208172              0 \n
"},{"location":"user-guide/profile/#further-help-with-slurm","title":"Further help with Slurm","text":"

The definitions of any variables discussed here and more usage information can be found in the man pages of sstat and sacct.

"},{"location":"user-guide/profile/#amd-prof","title":"AMD \u03bcProf","text":"

The AMD \u03bcProf tool provides capabilities for low-level profiling on AMD processors, see:

"},{"location":"user-guide/profile/#linaro-forge","title":"Linaro Forge","text":"

The Linaro Forge tool also provides profiling capabilities. See:

"},{"location":"user-guide/profile/#darshan-io-profiling","title":"Darshan IO profiling","text":"

The Darshan lightweight IO profiling tool provides a quick way to profile the IO part of your software:

"},{"location":"user-guide/python/","title":"Using Python","text":"

Python is supported on ARCHER2 both for running intensive parallel jobs and also as an analysis tool. This section describes how to use Python in either of these scenarios.

The Python installations on ARCHER2 contain some of the most commonly used packages. If you wish to install additional Python packages, we recommend that you use the pip command, see the section entitled Installing your own Python packages (with pip).

Important

Python 2 is not supported on ARCHER2 as it has been deprecated since the start of 2020.

Note

When you log onto ARCHER2, no Python module is loaded by default. You will generally need to load the cray-python module to access the functionality described below.

"},{"location":"user-guide/python/#hpe-cray-python-distribution","title":"HPE Cray Python distribution","text":"

The recommended way to use Python on ARCHER2 is to use the HPE Cray Python distribution.

The HPE Cray distribution provides Python 3 along with some of the most common packages used for scientific computation and data analysis. These include:

The HPE Cray Python distribution can be loaded (either on the front-end or in a submission script) using:

module load cray-python\n

Tip

The HPE Cray Python distribution is built using GCC compilers. If you wish to compile your own Python, C/C++ or Fortran code to use with HPE Cray Python, you should ensure that you compile using PrgEnv-gnu to make sure they are compatible.

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-pip","title":"Installing your own Python packages (with pip)","text":"

Sometimes, you may need to setup a local custom Python environment such that it extends a centrally-installed cray-python module. By extend, we mean being able to install packages locally that are not provided by cray-python. This is necessary because some Python packages such as mpi4py must be built specifically for the ARCHER2 system and so are best provided centrally.

You can do this by creating a lightweight virtual environment where the local packages can be installed. This environment is created on top of an existing Python installation, known as the environment's base Python.

First, load the PrgEnv-gnu environment.

auser@ln01:~> module load PrgEnv-gnu\n

This first step is necessary because subsequent pip installs may involve source code compilation and it is better that this be done using the GCC compilers to maintain consistency with how some base Python packages have been built.

Second, select the base Python by loading the cray-python module that you wish to extend.

auser@ln01:~> module load cray-python\n

Next, create the virtual environment within a designated folder.

python -m venv --system-site-packages /work/t01/t01/auser/myvenv\n

In our example, the environment is created within a myvenv folder located on /work, which means the environment will be accessible from the compute nodes. The --system-site-packages option ensures this environment is based on the currently loaded cray-python module. See https://docs.python.org/3/library/venv.html for more details.

You're now ready to activate your environment.

source /work/t01/t01/auser/myvenv/bin/activate\n

Tip

The myvenv path uses a fictitious project code, t01, and username, auser. Please remember to replace those values with your actual project code and username. Alternatively, you could enter ${HOME/home/work} in place of /work/t01/t01/auser. That command fragment expands ${HOME} and then replaces the home part with work.

Installing packages to your local environment can now be done as follows.

(myvenv) auser@ln01:~> python -m pip install <package name>\n

Running pip directly as in pip install <package name> will also work, but we show the python -m approach as this is consistent with the way the virtual environment was created. Further, if the package installation will require code compilation, you should amend the command to ensure use of the ARCHER2 compiler wrappers.

(myvenv) auser@ln01:~> CC=cc CXX=CC FC=ftn python -m pip install <package name>\n

And when you have finished installing packages, you can deactivate the environment by running the deactivate command.

(myvenv) auser@ln01:~> deactivate\nauser@ln01:~>\n

The packages you have installed will only be available once the local environment has been activated. So, when running code that requires these packages, you must first activate the environment, by adding the activation command to the submission script, as shown below.

#!/bin/bash --login\n\n#SBATCH --job-name=myvenv\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=64\n#SBATCH --cpus-per-task=2\n#SBATCH --time=00:10:00\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nsource /work/t01/t01/auser/myvenv/bin/activate\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\nsrun --distribution=block:block --hint=nomultithread python myvenv-script.py\n

Tip

If you find that a module you've installed to a virtual environment on /work isn't found when running a job, it may be that it was previously installed to the default location of $HOME/.local which is not mounted on the compute nodes. This can be an issue as pip will reuse any modules found at this default location rather than reinstall them into a virtual environment. Thus, even if the virtual environment is on /work, a module you've asked for may actually be located on /home.

You can check a module's install location and its dependencies with pip show, for example pip show matplotlib. You may then run pip uninstall matplotlib while no virtual environment is active to uninstall it from $HOME/.local, and then re-run pip install matplotlib while your virtual environment on /work is active to reinstall it there. You will need to do this for any modules installed on /home that will use either directly or indirectly. Remember you can check all your installed modules with pip list.

"},{"location":"user-guide/python/#extending-ml-modules-with-your-own-packages-via-pip","title":"Extending ML modules with your own packages via pip","text":"

The environment being extended does not have to come from one of the centrally-installed cray-python modules. You can also create a local virtual environment based on one of the Machine Learning (ML) modules, e.g., tensorflow or pytorch. One extra command is required; it is issued immediately after the python -m venv ... command.

extend-venv-activate /work/t01/t01/auser/myvenv\n

The extend-venv-activate command merely adds some extra commands to the virtual environment's activate script, ensuring that the Python packages will be gathered from the local virtual environment, the ML module and from the cray-python base module. All this means you would avoid having to install ML packages within your local area.

Note

The extend-venv-activate command becomes available (i.e., its location is placed on the path) only when the ML module is loaded. The ML modules are themselves based on cray-python. For example, tensorflow/2.12.0 is based on the cray-python/3.9.13.1 module.

"},{"location":"user-guide/python/#conda-on-archer2","title":"Conda on ARCHER2","text":"

Conda-based Python distributions (e.g. Anaconda, Mamba, Miniconda) are an extremely popular way of installing and accessing software on many systems, including ARCHER2. Although conda-based distributions can be used on ARCHER2, care is needed in how they are installed and configured so that the installation does not adversely effect your use of ARCHER2. In particular, you should be careful of:

We cover each of these points in more detail below.

"},{"location":"user-guide/python/#conda-install-location","title":"Conda install location","text":"

If you only need to use the files and executables from your conda installation on the login and data analysis nodes (via the serial QoS) then the best place to install conda is in your home directory structure - this will usually be the default install location provided by the installation script.

If you need to access the files and executables from conda on the compute nodes then you will need to install to a different location as the home file systems are not available on the compute nodes. The work file systems are not well suited to hosting Python software natively due to the way in which file access work, particularly during Python startup. There are two main options for using conda from ARCHER2 compute nodes:

  1. Use a conda container image
  2. Install conda on the solid state storage
"},{"location":"user-guide/python/#use-a-conda-container-image","title":"Use a conda container image","text":"

You can pull official conda-based container images from Dockerhub that you can use if you want just the standard set of Python modules that come with the distribution. For example, to get the latest Anaconda distribution as a Singularity container image on the ARCHER2 work file system, you would use (on an ARCHER2 login node, from the directory on the work file system where you want to store the container image):

singularity build anaconda3.sif docker://continuumio/anaconda3\n

Once you have the container image, you can run scripts in it with a command like:

singularity exec -B $PWD anaconda3.sif python my_script.py\n

As the container image is a single large file, you end up doing a single large read from the work file system rather than lots of small reads of individual Python files, this improves the performance of Python and reduces the detrimental impact on the wider file system performance for all users.

We have pre-built a Singularity container with the Anaconda distribution in on ARCHER2. Users can access it at $EPCC_SINGULARITY_DIR/anaconda3.sif. To run a Python script with the centrally-installed image, you can use:

singularity exec -B $PWD $EPCC_SINGULARITY_DIR/anaconda3.sif python my_script.py\n

If you want additional packages that are not available in the standard container images then you will need to build your own container images. If you need help to do this, then please contact the ARCHER2 Service Desk

"},{"location":"user-guide/python/#conda-addtions-to-shell-configuration-files","title":"Conda addtions to shell configuration files","text":"

During the install process most conda-based distributions will ask a question like:

Do you wish the installer to initialize Miniconda3 by running conda init?

If you are installing to the ARCHER2 work directories or the solid state storage, you should answer \"no\" to this question.

Adding the initialisation to shell startup scripts (typically .bashrc) means that every time you login to ARCHER2, the conda environment will try to initialise by reading lots of files within the conda installation. This approach was designed for the case where a user has installed conda on their personal device and so is the only user of the file system. For shared file systems such as those on ARCHER2, this places a large load on the file system and will lead to you seeing slow login times and slow response from your command line on ARCHER2. It will also lead to degraded read/write performance from the work file systems for you and other users so should be avoided at all costs.

If you have previously installed a conda distribution and answered \"yes\" to the question about adding the initialisation to shell configuration files, you should edit your ~/.bashrc file to remove the conda initialisation entries. This means deleting the lines that look something like:

# >>> conda initialize >>>\n# !! Contents within this block are managed by 'conda init' !!\n__conda_setup=\"$('/work/t01/t01/auser/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)\"\nif [ $? -eq 0 ]; then\neval \"$__conda_setup\"\nelse\nif [ -f \"/work/t01/t01/auser/miniconda3/etc/profile.d/conda.sh\" ]; then\n. \"/work/t01/t01/auser/miniconda3/etc/profile.d/conda.sh\"\nelse\nexport PATH=\"/work/t01/t01/auser/miniconda3/bin:$PATH\"\nfi\nfi\nunset __conda_setup\n# <<< conda initialize <<<\n
"},{"location":"user-guide/python/#running-python","title":"Running Python","text":""},{"location":"user-guide/python/#example-serial-python-submission-script","title":"Example serial Python submission script","text":"
#!/bin/bash --login\n\n#SBATCH --job-name=python_test\n#SBATCH --ntasks=1\n#SBATCH --time=00:10:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=serial\n#SBATCH --qos=serial\n\n# Load the Python module, ...\nmodule load cray-python\n\n# ..., or, if using local virtual environment\nsource <<path to virtual environment>>/bin/activate\n\n# Run your Python program\npython python_test.py\n
"},{"location":"user-guide/python/#example-mpi4py-job-submission-script","title":"Example mpi4py job submission script","text":"

Programs that have been parallelised with mpi4py can be run on the ARCHER2 compute nodes. Unlike the serial Python submission script however, we must launch the Python interpreter using srun. Failing to do so will result in Python running a single MPI rank only.

#!/bin/bash --login\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=mpi4py_test\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --time=0:10:0\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the Python module, ...\nmodule load cray-python\n\n# ..., or, if using local virtual environment\nsource <<path to virtual environment>>/bin/activate\n\n# Pass cpus-per-task setting to srun\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\n# Run your Python program\n#   Note that srun MUST be used to wrap the call to python,\n#   otherwise your code will run serially\nsrun --distribution=block:block --hint=nomultithread python mpi4py_test.py\n

Tip

If you have installed your own packages you will need to activate your local Python environment within your job submission script as shown at the end of Installing your own Python packages (with pip).

By default, mpi4py will use the Cray MPICH OFI library. If one wishes to use UCX instead, you must first, within the submission script, load PrgEnv-gnu before loading the UCX modules, as shown below.

module load PrgEnv-gnu\nmodule load craype-network-ucx\nmodule load cray-mpich-ucx\nmodule load cray-python\n
"},{"location":"user-guide/python/#running-python-at-scale","title":"Running Python at scale","text":"

The file system metadata server may become overloaded when running a parallel Python script over many fully populated nodes (i.e., 128 MPI ranks per node). Performance degrades due to the IO operations that accompany a high volume of Python import statements. Typically, each import will first require the module or library to be located by searching a number of file paths before the module is loaded into memory. Such a workload scales as Np x Nlib x Npath , where Np is the number of parallel processes, Nlib is the number of libraries imported and Npath the number of file paths searched. And so, in this way much time can be lost during the initial phase of a large Python job, not to mention the fact that the IO contention will be impacting other users of the system.

Spindle is a tool for improving the library-loading performance of dynamically linked HPC applications. It provides a mechanism for\u00a0scalable loading of shared libraries, executables and Python\u00a0files from a shared file system at scale without turning the file system into a bottleneck. This is achieved by caching libraries or their locations within node memory. Spindle takes a\u00a0pure user-space\u00a0approach: users do not need to configure new file systems, load particular OS kernels or build special system components. The tool operates on existing binaries \u2014\u00a0no application modification or special build flags\u00a0are required.

The script below shows how to run Spindle with your Python code.

#!/bin/bash --login\n\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=128\n...\n\nmodule load cray-python\nmodule load spindle/0.13\n\nexport SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}\n\nspindle --slurm --python-prefix=/opt/cray/pe/python/${CRAY_PYTHON_LEVEL} \\      \n    srun --overlap --distribution=block:block --hint=nomultithread \\\n        python mpi4py_script.py\n

The --python-prefix argument can be set to a list of colon-separated paths if necessary. In the example above, the CRAY_PYTHON_LEVEL environment variable is set as a conseqeunce of loading cray-python.

Note

The srun --overlap option is required for Spindle as the version of Slurm on ARCHER2 is newer than 20.11.

"},{"location":"user-guide/python/#using-jupyterlab-on-archer2","title":"Using JupyterLab on ARCHER2","text":"

It is possible to view and run Jupyter notebooks from both login nodes and compute nodes on ARCHER2.

Note

You can test such notebooks on the login nodes, but please do not attempt to run any computationally intensive work. Jobs may get killed once they hit a CPU limit on login nodes.

Please follow these steps.

  1. Install JupyterLab in your work directory.

    module load cray-python\nexport PYTHONUSERBASE=/work/t01/t01/auser/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\n# source <<path to virtual environment>>/bin/activate  # If using a virtualenvironment uncomment this line and remove the --user flag from the next\n\npip install --user jupyterlab\n

  2. If you want to test JupyterLab on the login node please go straight to step 3. To run your Jupyter notebook on a compute node, you first need to run an interactive session.

    srun --nodes=1 --exclusive --time=00:20:00 --account=<your_budget> \\\n     --partition=standard --qos=short --reservation=shortqos \\\n     --pty /bin/bash\n
    Your prompt will change to something like below.
    auser@nid001015:/tmp>\n
    In this case, the node id is nid001015. Now execute the following on the compute node.
    cd /work/t01/t01/auser # Update the path to your work directory\nexport PYTHONUSERBASE=$(pwd)/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\nexport HOME=$(pwd)\nmodule load cray-python\n# source <<path to virtual environment>>/bin/activate  # If using a virtualenvironment uncomment this line\n

  3. Run the JupyterLab server.

    export JUPYTER_RUNTIME_DIR=$(pwd)\njupyter lab --ip=0.0.0.0 --no-browser\n
    Once it's started, you will see a URL printed in the terminal window of the form http://127.0.0.1:<port_number>/lab?token=<string>; we'll need this URL for step 6.

  4. Please skip this step if you are connecting from a machine running Windows. Open a new terminal window on your laptop and run the following command.

    ssh <username>@login.archer2.ac.uk -L<port_number>:<node_id>:<port_number>\n
    where <username> is your username, and <node_id> is the id of the node you're currently on (for a login node, this will be ln01, or similar; on a compute node, it will be a mix of numbers and letters). In our example, <node_id> is nid001015. Note, please use the same port number as that shown in the URL of step 3. This number may vary, likely values are 8888 or 8889.

  5. Please skip this step if you are connecting from Linux or macOS. If you are connecting from Windows, you should use MobaXterm to configure an SSH tunnel as follows.

  6. Now, if you open a browser window locally, you should be able to navigate to the URL from step 3, and this should display the JupyterLab server. If JupyterLab is running on a compute node, the notebook will be available for the length of the interactive session you have requested.

Warning

Please do not use the other http address given by the JupyterLab output, the one formatted http://<node_id>:<port_number>/lab?token=<string>. Your local browser will not recognise the <node_id> part of the address.

"},{"location":"user-guide/python/#using-dask-job-queue-on-archer2","title":"Using Dask Job-Queue on ARCHER2","text":"

The Dask-jobqueue project makes it easy to deploy Dask on ARCHER2. You can find more information in the Dask Job-Queue documentation.

Please follow these steps:

  1. Install Dask-Jobqueue
module load cray-python\nexport PYTHONUSERBASE=/work/t01/t01/auser/.local\nexport PATH=$PYTHONUSERBASE/bin:$PATH\n\npip install --user dask-jobqueue --upgrade\n
  1. Using Dask

Dask-jobqueue creates a Dask Scheduler in the Python process where the cluster object is instantiated. A script for running dask jobs on ARCHER2 might look something like this:

from dask_jobqueue import SLURMCluster\ncluster = SLURMCluster(cores=128, \n                       processes=16,\n                       memory='256GB',\n                       queue='standard',\n                       header_skip=['--mem'],\n                       job_extra=['--qos=\"standard\"'],\n                       python='srun python',\n                       project='z19',\n                       walltime=\"01:00:00\",\n                       shebang=\"#!/bin/bash --login\",\n                       local_directory='$PWD',\n                       interface='hsn0',\n                       env_extra=['module load cray-python',\n                                  'export PYTHONUSERBASE=/work/t01/t01/auser/.local/',\n                                  'export PATH=$PYTHONUSERBASE/bin:$PATH',\n                                  'export PYTHONPATH=$PYTHONUSERBASE/lib/python3.8/site-packages:$PYTHONPATH'])\n\n\n\ncluster.scale(jobs=2)    # Deploy two single-node jobs\n\nfrom dask.distributed import Client\nclient = Client(cluster)  # Connect this local process to remote workers\n\n# wait for jobs to arrive, depending on the queue, this may take some time\nimport dask.array as da\nx = \u2026              # Dask commands now use these distributed resources\n

This script can be run on the login nodes and it submits the Dask jobs to the job queue. Users should ensure that the computationally intensive work is done with the Dask commands which run on the compute nodes.

The cluster object parameters specify the characteristics for running on a single compute node. The header_skip option is required as we are running on exclusive nodes where you should not specify the memory requirements, however Dask requires you to supply this option.

Jobs are be deployed with the cluster.scale command, where the jobs option sets the number of single node jobs requested. Job scripts are generated (from the cluster object) and these are submitted to the queue to begin running once the resources are available. You can check the status of the jobs by running squeue -u $USER in a separate terminal.

If you wish to see the generated job script you can use:

print(cluster.job_script())\n
"},{"location":"user-guide/scheduler/","title":"Running jobs on ARCHER2","text":"

As with most HPC services, ARCHER2 uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. ARCHER2 uses the Slurm software to schedule jobs.

Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.

Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.

Hint

If you have any questions on how to run jobs on ARCHER2 do not hesitate to contact the ARCHER2 Service Desk.

You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.

"},{"location":"user-guide/scheduler/#resources","title":"Resources","text":""},{"location":"user-guide/scheduler/#cus","title":"CUs","text":"

Time used on ARCHER2 is measured in CUs. 1 CU = 1 Node Hour for a standard 128 core node.

The CU calculator will help you to calculate the CU cost for your jobs.

"},{"location":"user-guide/scheduler/#checking-available-budget","title":"Checking available budget","text":"

You can check in SAFE by selecting Login accounts from the menu, select the login account you want to query.

Under Login account details you will see each of the budget codes you have access to listed e.g. e123 resources and then under Resource Pool to the right of this, a note of the remaining budget in CUs.

When logged in to the machine you can also use the command

sacctmgr show assoc where user=$LOGNAME format=account,user,maxtresmins\n

This will list all the budget codes that you have access to e.g.

   Account       User   MaxTRESMins\n---------- ---------- -------------\n      e123      userx         cpu=0\n e123-test      userx\n

This shows that userx is a member of budgets e123 and e123-test. However, the cpu=0 indicates that the e123 budget is empty or disabled. This user can submit jobs using the e123-test budget.

To see the number of CUs remaining you must check in SAFE.

"},{"location":"user-guide/scheduler/#charging","title":"Charging","text":"

Jobs run on ARCHER2 are charged for the time they use i.e. from the time the job begins to run until the time the job ends (not the full wall time requested).

Jobs are charged for the full number of nodes which are requested, even if they are not all used.

Charging takes place at the time the job ends, and the job is charged in full to the budget which is live at the end time.

"},{"location":"user-guide/scheduler/#basic-slurm-commands","title":"Basic Slurm commands","text":"

There are four key commands used to interact with the Slurm on the command line:

We cover each of these commands in more detail below.

"},{"location":"user-guide/scheduler/#sinfo-information-on-resources","title":"sinfo: information on resources","text":"

sinfo is used to query information about available resources and partitions. Without any options, sinfo lists the status of all resources and partitions, e.g.

auser@ln01:~> sinfo\n\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\nstandard     up 1-00:00:00    105  down* nid[001006,...,002014]\nstandard     up 1-00:00:00     12  drain nid[001016,...,001969]\nstandard     up 1-00:00:00      5   resv nid[001000,001002-001004,001114]\nstandard     up 1-00:00:00    683  alloc nid[001001,...,001970-001991]\nstandard     up 1-00:00:00    214   idle nid[001022-001023,...,002015-002023]\nstandard     up 1-00:00:00      2   down nid[001021,001050]\n

Here we see the number of nodes in different states. For example, 683 nodes are allocated (running jobs), and 214 are idle (available to run jobs).

Note

that long lists of node IDs have been abbreviated with ....

"},{"location":"user-guide/scheduler/#sbatch-submitting-jobs","title":"sbatch: submitting jobs","text":"

sbatch is used to submit a job script to the job submission system. The script will typically contain one or more srun commands to launch parallel tasks.

When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.

auser@ln01:~> sbatch test-job.slurm\nSubmitted batch job 12345\n
"},{"location":"user-guide/scheduler/#squeue-monitoring-jobs","title":"squeue: monitoring jobs","text":"

squeue without any options or arguments shows the current status of all jobs known to the scheduler. For example:

auser@ln01:~> squeue\n

will list all jobs on ARCHER2.

The output of this is often overwhelmingly large. You can restrict the output to just your jobs by adding the -u $USER option:

auser@ln01:~> squeue -u $USER\n
"},{"location":"user-guide/scheduler/#scancel-deleting-jobs","title":"scancel: deleting jobs","text":"

scancel is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately.

If you only want to cancel a specific job you need to provide the job ID of the job you wish to cancel/stop. For example:

auser@ln01:~> scancel 12345\n

will cancel (if waiting) or stop (if running) the job with ID 12345.

scancel can take other options. For example, if you want to cancel all your pending (queued) jobs but leave the running jobs running, you could use:

auser@ln01:~> scancel --state=PENDING --user=$USER\n
"},{"location":"user-guide/scheduler/#resource-limits","title":"Resource Limits","text":"

The ARCHER2 resource limits for any given job are covered by three separate attributes.

"},{"location":"user-guide/scheduler/#primary-resource","title":"Primary resource","text":"

The primary resource you can request for your job is the compute node.

Information

The --exclusive option is enforced on ARCHER2 which means you will always have access to all of the memory on the compute node regardless of how many processes are actually running on the node.

Note

You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.

"},{"location":"user-guide/scheduler/#partitions","title":"Partitions","text":"

On ARCHER2, compute nodes are grouped into partitions. You will have to specify a partition using the --partition option in your Slurm submission script. The following table has a list of active partitions on ARCHER2.

Full system Partition Description Max nodes available standard CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 256/512 GB memory 5860 highmem CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 512 GB memory 584 serial CPU nodes with AMD EPYC 7742 64-core processor \u00d7 2, 512 GB memory 2 gpu GPU nodes with AMD EPYC 32-core processor, 512 GB memory, 4\u00d7AMD Instinct MI210 GPU 4

Note

The standard partition includes both the standard memory and high memory nodes but standard memory nodes are preferentially chosen for jobs where possible. To guarantee access to high memory nodes you should specify the highmem partition.

"},{"location":"user-guide/scheduler/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

On ARCHER2, job limits are defined by the requested Quality of Service (QoS), as specified by the --qos Slurm directive. The following table lists the active QoS on ARCHER2.

Full system QoS Max Nodes Per Job Max Walltime Jobs Queued Jobs Running Partition(s) Notes standard 1024 24 hrs 64 16 standard Maximum of 1024 nodes in use by any one user at any time highmem 256 24 hrs 16 16 highmem Maximum of 512 nodes in use by any one user at any time taskfarm 16 24 hrs 128 32 standard Maximum of 256 nodes in use by any one user at any time short 32 20 mins 16 4 standard long 64 96 hrs 16 16 standard Minimum walltime of 24 hrs, maximum 512 nodes in use by any one user at any time, maximum of 2048 nodes in use by QoS largescale 5860 12 hrs 8 1 standard Minimum job size of 1025 nodes lowpriority 2048 24 hrs 16 16 standard Jobs not charged but requires at least 1 CU in budget to use. serial 32 cores and/or 128 GB memory 24 hrs 32 4 serial Jobs not charged but requires at least 1 CU in budget to use. Maximum of 32 cores and/or 128 GB in use by any one user at any time. reservation Size of reservation Length of reservation No limit no limit standard capabilityday At least 4096 nodes 3 hrs 8 2 standard Minimum job size of 512 nodes. Jobs only run during Capability Days gpu-shd 1 12 hrs 2 1 gpu GPU nodes potentially shared with other users gpu-exc 2 12 hrs 2 1 gpu GPU node exclusive node access

You can find out the QoS that you can use by running the following command:

Full system
auser@ln01:~> sacctmgr show assoc user=$USER cluster=archer2 format=cluster,account,user,qos%50\n

Hint

If you have needs which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.

"},{"location":"user-guide/scheduler/#e-mail-notifications","title":"E-mail notifications","text":"

E-mail notifications from the scheduler are not currently available on ARCHER2.

"},{"location":"user-guide/scheduler/#priority","title":"Priority","text":"

Job priority on ARCHER2 depends on a number of different factors:

Each of these factors is normalised to a value between 0 and 1, is multiplied with a weight and the resulting values combined to produce a priority for the job. The current job priority formula on ARCHER2 is:

Priority = [10000 * P(QoS)] + [500 * P(Age)] + [300 * P(Fairshare)] + [100 * P(size)]\n

The priority factors are:

You can view the priorities for current queued jobs on the system with the sprio command:

auser@ln04:~> sprio -l\n          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE        QOS\n         828764 standard        1049          0         45          0          4       1000\n         828765 standard        1049          0         45          0          4       1000\n         828770 standard        1049          0         45          0          4       1000\n         828771 standard        1012          0          8          0          4       1000\n         828773 standard        1012          0          8          0          4       1000\n         828791 standard        1012          0          8          0          4       1000\n         828797 standard        1118          0        115          0          4       1000\n         828800 standard        1154          0        150          0          4       1000\n         828801 standard        1154          0        150          0          4       1000\n         828805 standard        1118          0        115          0          4       1000\n         828806 standard        1154          0        150          0          4       1000\n
"},{"location":"user-guide/scheduler/#troubleshooting","title":"Troubleshooting","text":""},{"location":"user-guide/scheduler/#slurm-error-messages","title":"Slurm error messages","text":"

An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:

"},{"location":"user-guide/scheduler/#slurm-job-state-codes","title":"Slurm job state codes","text":"

The squeue command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.

Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.

For a full list of see Job State Codes.

"},{"location":"user-guide/scheduler/#slurm-queued-reasons","title":"Slurm queued reasons","text":"Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available.

For a full list of see Job Reasons.

"},{"location":"user-guide/scheduler/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"

Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file slurm_<JobID>.out. This file appears in the job's working directory once your job starts running.

Hint

Output may be buffered - to enable live output, e.g. for monitoring job status, add --unbuffered to the srun command in your Slurm script.

"},{"location":"user-guide/scheduler/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"

You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive #SBATCH.

Hint

Most options provided using #SBATCH directives can also be specified as command line options to srun.

If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too with the option:

Important

You must specify an account code for your job otherwise it will fail to submit with the error: sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified. (This error can also mean that you have specified a budget that has run out of resources.)

Other common options that are used are:

To prevent the behaviour of batch scripts being dependent on the user environment at the point of submission, the option

Using the --export=none means that the behaviour of batch submissions should be repeatable. We strongly recommend its use.

Note

When submitting your job, the scheduler will check that the requested resources are available e.g. that your account is a member of the requested budget, that the requested QoS exists. If things change before the job starts and e.g. your account has been removed from the requested budget or the requested QoS has been deleted then the job will not be able to start. In such cases, the job will be removed from the pending queue by our systems team, as it will no longer be eligible to run.

"},{"location":"user-guide/scheduler/#additional-options-for-parallel-jobs","title":"Additional options for parallel jobs","text":"

Note

For parallel jobs, ARCHER2 operates in a node exclusive way. This means that you are assigned resources in the units of full compute nodes for your jobs (i.e. 128 cores) and that no other user can share those compute nodes with you. Hence, the minimum amount of resource you can request for a parallel job is 1 node (or 128 cores).

In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.

For parallel jobs that use threading (e.g. OpenMP) or when you want to use less than 128 cores per node (e.g. to access more memory or memory bandwidth per core), you will also need to change the --cpus-per-task option.

For jobs using threading: - --cpus-per-task=<threads per task> the number of threads per parallel process (e.g. number of OpenMP threads per MPI task for hybrid MPI/OpenMP jobs). Important: you must also set the OMP_NUM_THREADS environment variable if using OpenMP in your job.

For jobs using less than 128 cores per node: - --cpus-per-task=<stride between placement of processes> the stride between the parallel processes. For example, if you want to double the memory and memory bandwidth per process on an ARCHER2 compute node you would want to place 64 processes per node and leave an empty core between each process you would set --cpus-per-task=2 and --ntasks-per-node=64.

Important

You must also add export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK to your job submission script to pass the --cpus-per-task setting from the job script to the srun command. (Alternatively, you could use the --cpus-per-task option in the srun command itself.) If you do not do this then the placement of processes/threads will be incorrect and you will likely see poor performance of your application.

"},{"location":"user-guide/scheduler/#options-for-jobs-on-the-data-analysis-nodes","title":"Options for jobs on the data analysis nodes","text":"

The data analysis nodes are shared between all users and can be used to run jobs that require small numbers of cores and/or access to an external network to transfer data. These jobs are often serial jobs that only require a single core.

To run jobs on the data analysis node you require the following options:

More information on using the data analysis nodes (including example job submission scripts) can be found in the Data Analysis section of the User and Best Practice Guide.

"},{"location":"user-guide/scheduler/#srun-launching-parallel-jobs","title":"srun: Launching parallel jobs","text":"

If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. In most cases you will want to add the options --distribution=block:block and --hint=nomultithread to your srun command to ensure you get the correct pinning of processes to cores on a compute node.

Warning

If you do not add the --distribution=block:block and --hint=nomultithread options to your srun command the default process placement may lead to a drop in performance for your jobs on ARCHER2.

A brief explanation of these options: - --hint=nomultithread - do not use hyperthreads/SMP - --distribution=block:block - the first block means use a block distribution of processes across nodes (i.e. fill nodes before moving onto the next one) and the second block means use a block distribution of processes across \"sockets\" within a node (i.e. fill a \"socket\" before moving on to the next one).

Important

The Slurm definition of a \"socket\" does not correspond to a physical CPU socket. On ARCHER2 it corresponds to a 4-core CCX (Core CompleX).

"},{"location":"user-guide/scheduler/#slurm-definition-of-a-socket","title":"Slurm definition of a \"socket\"","text":"

On ARCHER2, Slurm is configured with the following setting:

SlurmdParameters=l3cache_as_socket\n

The effect of this setting is to define a Slurm socket as a unit that has a shared L3 cache. On ARCHER2, this means that each Slurm \"socket\" corresponds to a 4-core CCX (Core CompleX). For a more detailed discussion on the hardware and the memory/cache layout see the Hardware section.

The effect of this setting can be illustrated by using the xthi program to report placement when we select a cyclic distribution of processes across sockets from srun (--distribution=block:cyclic). As you can see from the output from xthi included below, the cyclic per-socket distribution results in sequential MPI processes being placed on every 4th core (i.e. cyclic placement across CCX).

Node summary for    1 nodes:\nNode    0, hostname nid000006, mpi 128, omp   1, executable xthi_mpi\nMPI summary: 128 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    4)\nNode    0, rank    2, thread   0, (affinity =    8)\nNode    0, rank    3, thread   0, (affinity =   12)\nNode    0, rank    4, thread   0, (affinity =   16)\nNode    0, rank    5, thread   0, (affinity =   20)\nNode    0, rank    6, thread   0, (affinity =   24)\nNode    0, rank    7, thread   0, (affinity =   28)\nNode    0, rank    8, thread   0, (affinity =   32)\nNode    0, rank    9, thread   0, (affinity =   36)\nNode    0, rank   10, thread   0, (affinity =   40)\nNode    0, rank   11, thread   0, (affinity =   44)\nNode    0, rank   12, thread   0, (affinity =   48)\nNode    0, rank   13, thread   0, (affinity =   52)\nNode    0, rank   14, thread   0, (affinity =   56)\nNode    0, rank   15, thread   0, (affinity =   60)\nNode    0, rank   16, thread   0, (affinity =   64)\nNode    0, rank   17, thread   0, (affinity =   68)\nNode    0, rank   18, thread   0, (affinity =   72)\nNode    0, rank   19, thread   0, (affinity =   76)\nNode    0, rank   20, thread   0, (affinity =   80)\nNode    0, rank   21, thread   0, (affinity =   84)\nNode    0, rank   22, thread   0, (affinity =   88)\nNode    0, rank   23, thread   0, (affinity =   92)\nNode    0, rank   24, thread   0, (affinity =   96)\nNode    0, rank   25, thread   0, (affinity =  100)\nNode    0, rank   26, thread   0, (affinity =  104)\nNode    0, rank   27, thread   0, (affinity =  108)\nNode    0, rank   28, thread   0, (affinity =  112)\nNode    0, rank   29, thread   0, (affinity =  116)\nNode    0, rank   30, thread   0, (affinity =  120)\nNode    0, rank   31, thread   0, (affinity =  124)\nNode    0, rank   32, thread   0, (affinity =    1)\nNode    0, rank   33, thread   0, (affinity =    5)\nNode    0, rank   34, thread   0, (affinity =    9)\nNode    0, rank   35, thread   0, (affinity =   13)\nNode    0, rank   36, thread   0, (affinity =   17)\nNode    0, rank   37, thread   0, (affinity =   21)\nNode    0, rank   38, thread   0, (affinity =   25)\n\n...output trimmed...\n
"},{"location":"user-guide/scheduler/#bolt-job-submission-script-creation-tool","title":"bolt: Job submission script creation tool","text":"

The bolt job submission script creation tool has been written by EPCC to simplify the process of writing job submission scripts for modern multicore architectures. Based on the options you supply, bolt will generate a job submission script that uses ARCHER2 in a reasonable way.

MPI, OpenMP and hybrid MPI/OpenMP jobs are supported.

Warning

The tool will allow you to generate scripts for jobs that use the long QoS but you will need to manually modify the resulting script to change the QoS to long.

If there are problems or errors in your job parameter specifications then bolt will print warnings or errors. However, bolt cannot detect all problems.

"},{"location":"user-guide/scheduler/#basic-usage","title":"Basic Usage","text":"

The basic syntax for using bolt is:

bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] \\\n     -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code]  [arguments...]\n

Example 1: to generate a job script to run an executable called my_prog.x for 24 hours using 8192 parallel (MPI) processes and 128 (MPI) processes per compute node you would use something like:

bolt -n 8192 -N 128 -t 24:0:0 -o my_job.bolt -j my_job -A z01-budget my_prog.x arg1 arg2\n

(remember to substitute z01-budget for your actual budget code.)

Example 2: to generate a job script to run an executable called my_prog.x for 3 hours using 2048 parallel (MPI) processes and 64 (MPI) processes per compute node (i.e. using half of the cores on a compute node), you would use:

bolt -n 2048 -N 64 -t 3:0:0 -o my_job.bolt -j my_job -A z01-budget my_prog.x arg1 arg2\n

These examples generate the job script my_job.bolt with the correct options to run my_prog.x with command line arguments arg1 and arg2. The project code against which the job will be charged is specified with the ' -A ' option. As usual, the job script is submitted as follows:

sbatch my_job.bolt\n

Hint

If you do not specify the script name with the '-o' option then your script will be a file called a.bolt.

Hint

If you do not specify the number of parallel tasks then bolt will try to generate a serial job submission script (and throw an error on the ARCHER2 4 cabinet system as serial jobs are not supported).

Hint

If you do not specify a project code, bolt will use your default project code (set by your login account).

Hint

If you do not specify a job name, bolt will use either bolt_ser_job (for serial jobs) or bolt_par_job (for parallel jobs).

"},{"location":"user-guide/scheduler/#further-help","title":"Further help","text":"

You can access further help on using bolt on ARCHER2 with the ' -h ' option:

bolt -h\n

A selection of other useful options are:

"},{"location":"user-guide/scheduler/#checkscript-job-submission-script-validation-tool","title":"checkScript job submission script validation tool","text":"

The checkScript tool has been written to allow users to validate their job submission scripts before submitting their jobs. The tool will read your job submission script and try to identify errors, problems or inconsistencies.

An example of the sort of output the tool can give would be:

auser@ln01:/work/t01/t01/auser> checkScript submit.slurm\n\n===========================================================================\ncheckScript\n---------------------------------------------------------------------------\nCopyright 2011-2020  EPCC, The University of Edinburgh\nThis program comes with ABSOLUTELY NO WARRANTY.\nThis is free software, and you are welcome to redistribute it\nunder certain conditions.\n===========================================================================\n\nScript details\n---------------\n       User: auser\nScript file: submit.slurm\n  Directory: /work/t01/t01/auser (ok)\n   Job name: test (ok)\n  Partition: standard (ok)\n        QoS: standard (ok)\nCombination:          (ok)\n\nRequested resources\n-------------------\n         nodes =              3                     (ok)\ntasks per node =             16\n cpus per task =              8\ncores per node =            128                     (ok)\nOpenMP defined =           True                     (ok)\n      walltime =          1:0:0                     (ok)\n\nCU Usage Estimate (if full job time used)\n------------------------------------------\n                      CU =          3.000\n\n\n\ncheckScript finished: 0 warning(s) and 0 error(s).\n
"},{"location":"user-guide/scheduler/#checking-scripts-and-estimating-start-time-with-test-only","title":"Checking scripts and estimating start time with --test-only","text":"

sbatch --test-only validates the batch script and returns an estimate of when the job would be scheduled to run given the current scheduler state. Please note that it is just an estimate, the actual start time may differ as the scheduler status when the start time was estimated may be different once the job is actually submitted and due to subsequent changes to the scheduler state. The job is not actually submitted.

auser@ln01:~> sbatch --test-only submit.slurm\nsbatch: Job 1039497 to start at 2022-02-01T23:20:51 using 256 processors on nodes nid002836\nin partition standard\n
"},{"location":"user-guide/scheduler/#estimated-start-time-for-queued-jobs","title":"Estimated start time for queued jobs","text":"

You can use the squeue command to show the current estimated start time for a job. Please note that it is just an estimate, the actual start time may differ as the scheduler status when the start time was estimated may be different due to subsequent changes to the scheduler state. To return the estimated start time for a job you spacify the job ID with the --jobs=<jobid> and --Format=StartTime options.

For example, to show the estimated start time for job 123456, you would use:

squeue --jobs=123456 --Format=StartTime\n

The output from this command would look like:

START_TIME          \n2024-09-25T13:07:00\n
"},{"location":"user-guide/scheduler/#example-job-submission-scripts","title":"Example job submission scripts","text":"

A subset of example job submission scripts are included in full below. Examples are provided for both the full system and the 4-cabinet system.

"},{"location":"user-guide/scheduler/#example-job-submission-script-for-mpi-parallel-job","title":"Example: job submission script for MPI parallel job","text":"

A simple MPI job submission script to submit a job using 4 compute nodes and 128 MPI ranks per node for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 512 MPI processes and 128 MPI processes per node\n#   srun picks up the distribution from the sbatch options\n\nsrun --distribution=block:block --hint=nomultithread ./my_mpi_executable.x\n

This will run your executable \"my_mpi_executable.x\" in parallel on 512 MPI processes using 4 nodes (128 cores per node, i.e. not using hyper-threading). Slurm will allocate 4 nodes to your job and srun will place 128 MPI processes on each node (one per physical core).

See above for a more detailed discussion of the different sbatch options

"},{"location":"user-guide/scheduler/#example-job-submission-script-for-mpiopenmp-mixed-mode-parallel-job","title":"Example: job submission script for MPI+OpenMP (mixed mode) parallel job","text":"

Mixed mode codes that use both MPI (or another distributed memory parallel model) and OpenMP should take care to ensure that the shared memory portion of the process/thread placement does not span more than one NUMA region. Nodes on ARCHER2 are made up of two sockets each containing 4 NUMA regions of 16 cores, i.e. there are 8 NUMA regions in total. Therefore the total number of threads should ideally not be greater than 16, and also needs to be a factor of 16. Sensible choices for the number of threads are therefore 1 (single-threaded), 2, 4, 8, and 16. More information about using OpenMP and MPI+OpenMP can be found in the Tuning chapter.

To ensure correct placement of MPI processes the number of cpus-per-task needs to match the number of OpenMP threads, and the number of tasks-per-node should be set to ensure the entire node is filled with MPI tasks.

In the example below, we are using 4 nodes for 6 hours. There are 32 MPI processes in total (8 MPI processes per node) and 16 OpenMP threads per MPI process. This results in all 128 physical cores per node being used.

Hint

Note the use of the export OMP_PLACES=cores environment option to generate the correct thread pinning.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 16 and specify placement\n#   There are 16 OpenMP threads per MPI process\n#   We want one thread per physical core\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\n\n# Launch the parallel job\n#   Using 32 MPI processes\n#   8 MPI processes per node\n#   16 OpenMP threads per MPI process\n#   Additional srun options to pin one thread per physical core\nsrun --hint=nomultithread --distribution=block:block ./my_mixed_executable.x arg1 arg2\n
"},{"location":"user-guide/scheduler/#job-arrays","title":"Job arrays","text":"

The Slurm job scheduling system offers the job array concept, for running collections of almost-identical jobs. For example, running the same program several times with different arguments or input data.

Each job in a job array is called a subjob. The subjobs of a job array can be submitted and queried as a unit, making it easier and cleaner to handle the full set, compared to individual jobs.

All subjobs in a job array are started by running the same job script. The job script also contains information on the number of jobs to be started, and Slurm provides a subjob index which can be passed to the individual subjobs or used to select the input data per subjob.

"},{"location":"user-guide/scheduler/#job-script-for-a-job-array","title":"Job script for a job array","text":"

As an example, the following script runs 56 subjobs, with the subjob index as the only argument to the executable. Each subjob requests a single node and uses all 128 cores on the node by placing 1 MPI process per core and specifies 4 hours maximum runtime per subjob:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=Example_Array_Job\n#SBATCH --time=04:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --array=0-55\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nsrun --distribution=block:block --hint=nomultithread /path/to/exe $SLURM_ARRAY_TASK_ID\n
"},{"location":"user-guide/scheduler/#submitting-a-job-array","title":"Submitting a job array","text":"

Job arrays are submitted using sbatch in the same way as for standard jobs:

sbatch job_script.pbs\n
"},{"location":"user-guide/scheduler/#expressing-dependencies-between-jobs","title":"Expressing dependencies between jobs","text":"

SLURM allows one to express dependencies between jobs using the --dependency (or -d) option. This allows the start of execution of the dependent job to be delayed until some condition involving a current or previous job, or set of jobs, has been satisfied. A simple example might be:

$ sbatch --dependency=4394150 myscript.sh\nSubmitted batch job 4394325\n
This states that the execution of the new batch job should not start until job 4394150 has completed/terminated. Here, completion/termination is the only condition. The new job 4394325 should appear in the pending state with reason (Dependency) assuming 4394150 is still running.

A dependency may be of a different type, of which there are a number of relevant possibilities. If we explicitly include the default type afterany in the example above, we would have

$ sbatch --dependency=afterany:4394150 myscript.sh\nSubmitted batch job 4394325\n
This emphasises that the first job may complete with any exit code, and still satisfy the dependency. If we wanted a dependent job which would only become eligible for execution following successful completion of the dependency, we would use afterok:
$ sbatch --dependency=afterok:4394150 myscript.sh\nSubmitted batch job 4394325\n
This means that should the dependency fail with non-zero exit code, the dependent job will be in a state where it will never run. This may appear in squeue as (DependencyNeverSatisfied) as the reason. Such jobs will need to be cancelled.

The general form of the dependency list is <type:job_id[:job_id] [,type:job_id ...]> where a dependency may include one or more jobs, with one or more types. If a list is comma-separated, all the dependencies must be satisfied before the dependent job becomes eligible. The use of ? as the list separator implies that any of the dependencies is sufficient.

Useful type options include afterany, afterok, and afternotok. For the last case, the dependency is only satisfied if there is non-zero exit code (the opposite of afterok). See the current SLURM documentation for a full list of possibilities.

"},{"location":"user-guide/scheduler/#chains-of-jobs","title":"Chains of jobs","text":""},{"location":"user-guide/scheduler/#fixed-number-of-jobs","title":"Fixed number of jobs","text":"

Job dependencies can be used to construct complex pipelines or chain together long simulations requiring multiple steps.

For example, if we have just two jobs, the following shell script extract will submit the second dependent on the first, irrespective of actual job ID:

jobid=$(sbatch --parsable first_job.sh)\nsbatch --dependency=afterok:${jobid} second_job.sh\n
where we have used the --parsable option to sbatch to return just the new job ID (without the Submitted batch job).

This can be extended to a longer chain as required. E.g.:

jobid1=$(sbatch --parsable first_job.sh)\njobid2=$(sbatch --parsable --dependency=afterok:${jobid1} second_job.sh)\njobid3=$(sbatch --parsable --dependency=afterok:${jobid1} third_job.sh)\nsbatch --dependency=afterok:${jobid2},afterok:${jobid3} last_job.sh\n
Note jobs 2 and 3 are dependent on job 1 (only), but the final job is dependent on both jobs 2 and 3. This allows quite general workflows to be constructed.

"},{"location":"user-guide/scheduler/#number-of-jobs-not-known-in-advance","title":"Number of jobs not known in advance","text":"

This automation may be taken a step further to a case where a submission script propagates itself. E.g., a script might include, schematically,

#SBATCH ...\n\n# submit new job here ...\nsbatch --dependency=afterok:${SLURM_JOB_ID} thisscript.sh\n\n# perform work here...\nsrun ...\n
where the original submission of the script will submit a new instance of itself dependent on its own successful completion. This is done via the SLURM environment variable SLURM_JOB_ID which holds the id of the current job. One could defer the sbatch until the end of the script to avoid the dependency never being satisfied if the work associated with the srun fails. This approach can be useful in situations where, e.g., simulations with checkpoint/restart need to continue until some criterion is met. Some care may be required to ensure the script logic is correct in determining the criterion for stopping: it is best to start with a small/short test example. Incorrect logic and/or errors may lead to a rapid proliferation of submitted jobs.

Termination of such chains needs to be arranged either via appropriate logic in the script, or manual intervention to cancel pending jobs when no longer required.

"},{"location":"user-guide/scheduler/#using-multiple-srun-commands-in-a-single-job-script","title":"Using multiple srun commands in a single job script","text":"

You can use multiple srun commands within in a Slurm job submission script to allow you to use the resource requested more flexibly. For example, you could run a collection of smaller jobs within the requested resources or you could even subdivide nodes if your individual calculations do not scale up to use all 128 cores on a node.

In this guide we will cover two scenarios:

  1. Subdividing the job into multiple full-node or multi-node subjobs, e.g. requesting 100 nodes and running 100, 1-node subjobs or 50, 2-node subjobs.
  2. Subdividing the job into multiple subjobs that each use a fraction of a node, e.g. requesting 2 nodes and running 256, 1-core subjobs or 16, 16-core subjobs.
"},{"location":"user-guide/scheduler/#running-multiple-full-node-subjobs-within-a-larger-job","title":"Running multiple, full-node subjobs within a larger job","text":"

When subdivding a larger job into smaller subjobs you typically need to overwrite the --nodes option to srun and add the --ntasks option to ensure that each subjob runs on the correct number of nodes and that subjobs are placed correctly onto separate nodes.

For example, we will show how to request 100 nodes and then run 100 separate 1-node jobs, each of which use 128 MPI processes and which run on a different compute node. We start by showing the job script that would achieve this and then explain how this works and the options used. In our case, we will run 100 copies of the xthi program that prints the process placement on the node it is running on.

#!/bin/bash\n\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=multi_xthi\n#SBATCH --time=0:20:0\n#SBATCH --nodes=100\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the xthi module\nmodule load xthi\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Loop over 100 subjobs starting each of them on a separate node\nfor i in $(seq 1 100)\ndo\n# Launch this subjob on 1 node, note nodes and ntasks options and & to place subjob in the background\n    srun --nodes=1 --ntasks=128 --distribution=block:block --hint=nomultithread xthi > placement${i}.txt &\ndone\n# Wait for all background subjobs to finish\nwait\n

Key points from the example job script:

"},{"location":"user-guide/scheduler/#running-multiple-subjobs-that-each-use-a-fraction-of-a-node","title":"Running multiple subjobs that each use a fraction of a node","text":"

As the ARCHER2 nodes contain a large number of cores (128 per node) it may sometimes be useful to be able to run multiple executables on a single node. For example, you may want to run 128 copies of a serial executable or Python script; or, you may want to run multiple copies of parallel executables that use fewer than 128 cores each. This use model is possible using multiple srun commands in a job script on ARCHER2

Note

You can never share a compute node with another user. Although you can use srun to place multiple copies of an executable or script on a compute node, you still have exclusive use of that node. The minimum amount of resources you can reserve for your use on ARCHER2 is a single node.

When using srun to place multiple executables or scripts on a compute node you must be aware of a few things:

Below, we provide four examples or running multiple subjobs in a node: one that runs 128 serial processes across a single node; one that runs 8 subjobs each of which use 8 MPI processes with 2 OpenMP threads per MPI process; one that runs four inhomogeneous jobs, each of which requires a different number of MPI processes and OpenMP threads per process; and one that runs 256 serial processes across two nodes.

"},{"location":"user-guide/scheduler/#example-1-128-serial-tasks-running-on-a-single-node","title":"Example 1: 128 serial tasks running on a single node","text":"

For our first example, we will run 128 single-core copies of the xthi program (which prints process/thread placement) on a single ARCHER2 compute node with each copy of xthi pinned to a different core. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiSerialOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Loop over 128 subjobs pinning each to a different core\nfor i in $(seq 1 128)\ndo\n# Launch subjob overriding job settings as required and in the background\n# Make sure to change the amount specified by the `--mem=` flag to the amount\n# of memory required. The amount of memory is given in MiB by default but other\n# units can be specified. If you do not know how much memory to specify, we\n# recommend that you specify `--mem=1500M` (1,500 MiB).\nsrun --nodes=1 --ntasks=1 --ntasks-per-node=1 \\\n      --exact --mem=1500M xthi > placement${i}.txt &\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-2-8-subjobs-on-1-node-each-with-8-mpi-processes-and-2-openmp-threads-per-process","title":"Example 2: 8 subjobs on 1 node each with 8 MPI processes and 2 OpenMP threads per process","text":"

For our second example, we will run 8 subjobs, each running the xthi program (which prints process/thread placement) across 1 node. Each subjob will use 8 MPI processes and 2 OpenMP threads per process. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiParallelOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=64\n#SBATCH --cpus-per-task=2\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 2 as required by all subjobs\nexport OMP_NUM_THREADS=2\n\n# Loop over 8 subjobs\nfor i in $(seq 1 8)\ndo\n    echo $j $i\n    # Launch subjob overriding job settings as required and in the background\n    # Make sure to change the amount specified by the `--mem=` flag to the amount\n    # of memory required. The amount of memory is given in MiB by default but other\n    # units can be specified. If you do not know how much memory to specify, we\n    # recommend that you specify `--mem=12500M` (12,500 MiB).\n    srun --nodes=1 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=2 \\\n    --exact --mem=12500M xthi > placement${i}.txt &\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-3-running-inhomogeneous-subjobs-on-one-node","title":"Example 3: Running inhomogeneous subjobs on one node","text":"

For our third example, we will run 4 subjobs, each running the xthi program (which prints process/thread placement) across 1 node. Our subjobs will each run with a different number of MPI processes and OpenMP threads. We will run: one job with 64 MPI processes and 1 OpenMP process per thread; one job with 16 MPI processes and 2 threads per process; one job with 4 MPI processes and 4 OpenMP threads per job; and, one job with 1 MPI process and 16 OpenMP threads per job.

To be able to change the number of MPI processes and OpenMP threads per process, we will need to forgo using the #SBATCH --ntasks-per-node and the #SBATCH cpus-per-task commands -- if you set these Slurm will not let you alter the OMP_NUM_THREADS variable and you will not be able to change the number of OpenMP threads per process between each job.

Before each srun command, you will need to define the number of OpenMP threads per process you want by changing the OMP_NUM_THREADS variable. Furthermore, for each srun command, you will need to set the --ntasks flag to equal the number of MPI processes you want to use. You will also need to set the --cpus-per-task flag to equal the number of OpenMP threads per process you want to use.

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiParallelOnCompute\n#SBATCH --time=0:10:0\n#SBATCH --nodes=1\n#SBATCH --hint=nomultithread\n#SBATCH --distribution=block:block\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to value required by the first job\nexport OMP_NUM_THREADS=1\nsrun --ntasks=64 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=2\nsrun --ntasks=16 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=4\nsrun --ntasks=4 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Set the number of threads to the value required by the second job\nexport OMP_NUM_THREADS=16\nsrun --ntasks=1 --cpus-per-task=${OMP_NUM_THREADS} \\\n      --exact --mem=12500M xthi > placement${OMP_NUM_THREADS}.txt &\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#example-4-256-serial-tasks-running-across-two-nodes","title":"Example 4: 256 serial tasks running across two nodes","text":"

For our fourth example, we will run 256 single-core copies of the xthi program (which prints process/thread placement) across two ARCHER2 compute nodes with each copy of xthi pinned to a different core. We will illustrate a mechanism for getting the node IDs to pass to srun as this is required to ensure that the individual subjobs are assigned to the correct node. This mechanism uses the scontrol command to turn the nodelist from sbatch into a format we can use as input to srun. The job submission script for this example would look like:

#!/bin/bash\n# Slurm job options (job-name, compute nodes, job time)\n#SBATCH --job-name=MultiSerialOnComputes\n#SBATCH --time=0:10:0\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=128\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Make xthi available\nmodule load xthi\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Propagate the cpus-per-task setting from script to srun commands\n#    By default, Slurm does not propagate this setting from the sbatch\n#    options to srun commands in the job script. If this is not done,\n#    process/thread pinning may be incorrect leading to poor performance\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Get a list of the nodes assigned to this job in a format we can use.\n#   scontrol converts the condensed node IDs in the sbatch environment\n#   variable into a list of full node IDs that we can use with srun to\n#   ensure the subjobs are placed on the correct node. e.g. this converts\n#   \"nid[001234,002345]\" to \"nid001234 nid002345\"\nnodelist=$(scontrol show hostnames $SLURM_JOB_NODELIST)\n\n# Loop over the nodes assigned to the job\nfor nodeid in $nodelist\ndo\n    # Loop over 128 subjobs on each node pinning each to a different core\n    for i in $(seq 1 128)\n    do\n        # Launch subjob overriding job settings as required and in the background\n        # Make sure to change the amount specified by the `--mem=` flag to the amount\n        # of memory required. The amount of memory is given in MiB by default but other\n        # units can be specified. If you do not know how much memory to specify, we\n        # recommend that you specify `--mem=1500M` (1,500 MiB).\n        srun --nodelist=${nodeid} --nodes=1 --ntasks=1 --ntasks-per-node=1 \\\n        --exact --mem=1500M xthi > placement_${nodeid}_${i}.txt &\n    done\ndone\n\n# Wait for all subjobs to finish\nwait\n
"},{"location":"user-guide/scheduler/#process-placement","title":"Process placement","text":"

There are many occasions where you may want to control (usually, MPI) process placement and change it from the default, for example:

There are a number of different methods for defining process placement, below we cover two different options: using Slurm options and using the MPICH_RANK_REORDER_METHOD environment variable. Most users will likely use the Slurm options approach.

"},{"location":"user-guide/scheduler/#standard-process-placement","title":"Standard process placement","text":"

The standard approach recommended on ARCHER2 is to place processes sequentially on nodes until the maximum number of tasks is reached. You can use the xthi program to verify this for MPI process placement:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170365\nsalloc: job 1170365 queued and waiting for resources\nsalloc: job 1170365 has been allocated resources\nsalloc: Granted job allocation 1170365\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002526-002527] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:block --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002526, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002527, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\n\n...output trimmed...\n\nNode    0, rank  124, thread   0, (affinity =  124)\nNode    0, rank  125, thread   0, (affinity =  125)\nNode    0, rank  126, thread   0, (affinity =  126)\nNode    0, rank  127, thread   0, (affinity =  127)\nNode    1, rank  128, thread   0, (affinity =    0)\nNode    1, rank  129, thread   0, (affinity =    1)\nNode    1, rank  130, thread   0, (affinity =    2)\nNode    1, rank  131, thread   0, (affinity =    3)\n\n...output trimmed...\n

Note

For MPI programs on ARCHER2, each rank corresponds to a process.

Important

To get good performance out of MPI collective operations, MPI processes should be placed sequentially on cores as in the standard placement described above.

"},{"location":"user-guide/scheduler/#setting-process-placement-using-slurm-options","title":"Setting process placement using Slurm options","text":""},{"location":"user-guide/scheduler/#for-underpopulation-of-nodes-with-processes","title":"For underpopulation of nodes with processes","text":"

When you are using fewer processes than cores on compute nodes (i.e. < 128 processes per node) the basic Slurm options (usually supplied in your script as options to sbatch) for process placement are:

In addition, the following options are added to your srun commands in your job submission script:

For example, to place 32 processes per node and have 1 process per 4-core block (corresponding to a CCX, Core CompleX, that shares an L3 cache), you would set:

Here is the output from xthi:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=32 \\\n     --cpus-per-task=4 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170383\nsalloc: job 1170383 queued and waiting for resources\nsalloc: job 1170383 has been allocated resources\nsalloc: Granted job allocation 1170383\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002526-002527] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:block --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002526, mpi  32, omp   1, executable xthi\nNode    1, hostname nid002527, mpi  32, omp   1, executable xthi\nMPI summary: 64 ranks\nNode    0, rank    0, thread   0, (affinity =  0-3)\nNode    0, rank    1, thread   0, (affinity =  4-7)\nNode    0, rank    2, thread   0, (affinity = 8-11)\nNode    0, rank    3, thread   0, (affinity = 12-15)\nNode    0, rank    4, thread   0, (affinity = 16-19)\nNode    0, rank    5, thread   0, (affinity = 20-23)\nNode    0, rank    6, thread   0, (affinity = 24-27)\nNode    0, rank    7, thread   0, (affinity = 28-31)\nNode    0, rank    8, thread   0, (affinity = 32-35)\nNode    0, rank    9, thread   0, (affinity = 36-39)\nNode    0, rank   10, thread   0, (affinity = 40-43)\nNode    0, rank   11, thread   0, (affinity = 44-47)\nNode    0, rank   12, thread   0, (affinity = 48-51)\nNode    0, rank   13, thread   0, (affinity = 52-55)\nNode    0, rank   14, thread   0, (affinity = 56-59)\nNode    0, rank   15, thread   0, (affinity = 60-63)\nNode    0, rank   16, thread   0, (affinity = 64-67)\nNode    0, rank   17, thread   0, (affinity = 68-71)\nNode    0, rank   18, thread   0, (affinity = 72-75)\nNode    0, rank   19, thread   0, (affinity = 76-79)\nNode    0, rank   20, thread   0, (affinity = 80-83)\nNode    0, rank   21, thread   0, (affinity = 84-87)\nNode    0, rank   22, thread   0, (affinity = 88-91)\nNode    0, rank   23, thread   0, (affinity = 92-95)\nNode    0, rank   24, thread   0, (affinity = 96-99)\nNode    0, rank   25, thread   0, (affinity = 100-103)\nNode    0, rank   26, thread   0, (affinity = 104-107)\nNode    0, rank   27, thread   0, (affinity = 108-111)\nNode    0, rank   28, thread   0, (affinity = 112-115)\nNode    0, rank   29, thread   0, (affinity = 116-119)\nNode    0, rank   30, thread   0, (affinity = 120-123)\nNode    0, rank   31, thread   0, (affinity = 124-127)\nNode    1, rank   32, thread   0, (affinity =  0-3)\nNode    1, rank   33, thread   0, (affinity =  4-7)\nNode    1, rank   34, thread   0, (affinity = 8-11)\nNode    1, rank   35, thread   0, (affinity = 12-15)\nNode    1, rank   36, thread   0, (affinity = 16-19)\nNode    1, rank   37, thread   0, (affinity = 20-23)\nNode    1, rank   38, thread   0, (affinity = 24-27)\nNode    1, rank   39, thread   0, (affinity = 28-31)\nNode    1, rank   40, thread   0, (affinity = 32-35)\nNode    1, rank   41, thread   0, (affinity = 36-39)\nNode    1, rank   42, thread   0, (affinity = 40-43)\nNode    1, rank   43, thread   0, (affinity = 44-47)\nNode    1, rank   44, thread   0, (affinity = 48-51)\nNode    1, rank   45, thread   0, (affinity = 52-55)\nNode    1, rank   46, thread   0, (affinity = 56-59)\nNode    1, rank   47, thread   0, (affinity = 60-63)\nNode    1, rank   48, thread   0, (affinity = 64-67)\nNode    1, rank   49, thread   0, (affinity = 68-71)\nNode    1, rank   50, thread   0, (affinity = 72-75)\nNode    1, rank   51, thread   0, (affinity = 76-79)\nNode    1, rank   52, thread   0, (affinity = 80-83)\nNode    1, rank   53, thread   0, (affinity = 84-87)\nNode    1, rank   54, thread   0, (affinity = 88-91)\nNode    1, rank   55, thread   0, (affinity = 92-95)\nNode    1, rank   56, thread   0, (affinity = 96-99)\nNode    1, rank   57, thread   0, (affinity = 100-103)\nNode    1, rank   58, thread   0, (affinity = 104-107)\nNode    1, rank   59, thread   0, (affinity = 108-111)\nNode    1, rank   60, thread   0, (affinity = 112-115)\nNode    1, rank   61, thread   0, (affinity = 116-119)\nNode    1, rank   62, thread   0, (affinity = 120-123)\nNode    1, rank   63, thread   0, (affinity = 124-127)\n

Tip

You usually only want to use physical cores on ARCHER2, so (ntasks-per-node) \u00d7 (cpus-per-task) should generally be equal to 128.

"},{"location":"user-guide/scheduler/#full-node-population-with-non-sequential-process-placement","title":"Full node population with non-sequential process placement","text":"

If you want to change the order processes are placed on nodes and cores using Slurm options then you should use the --distribution option to srun to change this.

For example, to place processes sequentially on nodes but round-robin on the 16-core NUMA regions in a single node, you would use the --distribution=block:cyclic option to srun. This type of process placement can be beneficial when a code is memory bound.

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170594\nsalloc: job 1170594 queued and waiting for resources\nsalloc: job 1170594 has been allocated resources\nsalloc: Granted job allocation 1170594\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002616,002621] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> srun --distribution=block:cyclic --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =   16)\nNode    0, rank    2, thread   0, (affinity =   32)\nNode    0, rank    3, thread   0, (affinity =   48)\nNode    0, rank    4, thread   0, (affinity =   64)\nNode    0, rank    5, thread   0, (affinity =   80)\nNode    0, rank    6, thread   0, (affinity =   96)\nNode    0, rank    7, thread   0, (affinity =  112)\nNode    0, rank    8, thread   0, (affinity =    1)\nNode    0, rank    9, thread   0, (affinity =   17)\nNode    0, rank   10, thread   0, (affinity =   33)\nNode    0, rank   11, thread   0, (affinity =   49)\nNode    0, rank   12, thread   0, (affinity =   65)\nNode    0, rank   13, thread   0, (affinity =   81)\nNode    0, rank   14, thread   0, (affinity =   97)\nNode    0, rank   15, thread   0, (affinity =  113\n\n...output trimmed...\n\nNode    0, rank  120, thread   0, (affinity =   15)\nNode    0, rank  121, thread   0, (affinity =   31)\nNode    0, rank  122, thread   0, (affinity =   47)\nNode    0, rank  123, thread   0, (affinity =   63)\nNode    0, rank  124, thread   0, (affinity =   79)\nNode    0, rank  125, thread   0, (affinity =   95)\nNode    0, rank  126, thread   0, (affinity =  111)\nNode    0, rank  127, thread   0, (affinity =  127)\nNode    1, rank  128, thread   0, (affinity =    0)\nNode    1, rank  129, thread   0, (affinity =   16)\nNode    1, rank  130, thread   0, (affinity =   32)\nNode    1, rank  131, thread   0, (affinity =   48)\nNode    1, rank  132, thread   0, (affinity =   64)\nNode    1, rank  133, thread   0, (affinity =   80)\nNode    1, rank  134, thread   0, (affinity =   96)\nNode    1, rank  135, thread   0, (affinity =  112)\n\n...output trimmed...\n

If you wish to place processes round robin on both nodes and 16-core regions (cores that share access to a DRAM single memory controller) within in a node you would use --distribution=cyclic:cyclic:

auser@ln04:/work/t01/t01/auser> salloc --nodes=2 --ntasks-per-node=128 \\\n     --cpus-per-task=1 --time=0:10:0 --partition=standard --qos=short \\\n     --account=[your account]\n\nsalloc: Pending job allocation 1170594\nsalloc: job 1170594 queued and waiting for resources\nsalloc: job 1170594 has been allocated resources\nsalloc: Granted job allocation 1170594\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid[002616,002621] are ready for job\n\nauser@ln04:/work/t01/t01/auser> module load xthi\nauser@ln04:/work/t01/t01/auser> export OMP_NUM_THREADS=1\nauser@ln04:/work/t01/t01/auser> export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nauser@ln04:/work/t01/t01/auser> srun --distribution=cyclic:cyclic --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    2, thread   0, (affinity =   16)\nNode    0, rank    4, thread   0, (affinity =   32)\nNode    0, rank    6, thread   0, (affinity =   48)\nNode    0, rank    8, thread   0, (affinity =   64)\nNode    0, rank   10, thread   0, (affinity =   80)\nNode    0, rank   12, thread   0, (affinity =   96)\nNode    0, rank   14, thread   0, (affinity =  112)\nNode    0, rank   16, thread   0, (affinity =    1)\nNode    0, rank   18, thread   0, (affinity =   17)\nNode    0, rank   20, thread   0, (affinity =   33)\nNode    0, rank   22, thread   0, (affinity =   49)\nNode    0, rank   24, thread   0, (affinity =   65)\nNode    0, rank   26, thread   0, (affinity =   81)\nNode    0, rank   28, thread   0, (affinity =   97)\nNode    0, rank   30, thread   0, (affinity =  113)\n\n...output trimmed...\n\nNode    1, rank    1, thread   0, (affinity =    0)\nNode    1, rank    3, thread   0, (affinity =   16)\nNode    1, rank    5, thread   0, (affinity =   32)\nNode    1, rank    7, thread   0, (affinity =   48)\nNode    1, rank    9, thread   0, (affinity =   64)\nNode    1, rank   11, thread   0, (affinity =   80)\nNode    1, rank   13, thread   0, (affinity =   96)\nNode    1, rank   15, thread   0, (affinity =  112)\nNode    1, rank   17, thread   0, (affinity =    1)\nNode    1, rank   19, thread   0, (affinity =   17)\nNode    1, rank   21, thread   0, (affinity =   33)\nNode    1, rank   23, thread   0, (affinity =   49)\nNode    1, rank   25, thread   0, (affinity =   65)\nNode    1, rank   27, thread   0, (affinity =   81)\nNode    1, rank   29, thread   0, (affinity =   97)\nNode    1, rank   31, thread   0, (affinity =  113)\n\n...output trimmed...\n

Remember, MPI collective performance is generally much worse if processes are not placed sequentially on a node (so adjacent MPI ranks are as close to each other as possible). This is the reason that the default recommended placement on ARCHER2 is sequential rather than round-robin.

"},{"location":"user-guide/scheduler/#mpich_rank_reorder_method-for-mpi-process-placement","title":"MPICH_RANK_REORDER_METHOD for MPI process placement","text":"

The MPICH_RANK_REORDER_METHOD environment variable can also be used to specify other types of MPI task placement. For example, setting it to \"0\" results in a round-robin placement on both nodes and NUMA regions in a node (equivalent to the --distribution=cyclic:cyclic option to srun). Note, we do not specify the --distribution option to srun in this case as the environment variable is controlling placement:

salloc --nodes=8 --ntasks-per-node=2 --cpus-per-task=1 --time=0:10:0 --account=t01\n\nsalloc: Granted job allocation 24236\nsalloc: Waiting for resource configuration\nsalloc: Nodes cn13 are ready for job\n\nmodule load xthi\nexport OMP_NUM_THREADS=1\nexport MPICH_RANK_REORDER_METHOD=0\nsrun --hint=nomultithread xthi\n\nNode summary for    2 nodes:\nNode    0, hostname nid002616, mpi 128, omp   1, executable xthi\nNode    1, hostname nid002621, mpi 128, omp   1, executable xthi\nMPI summary: 256 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    2, thread   0, (affinity =   16)\nNode    0, rank    4, thread   0, (affinity =   32)\nNode    0, rank    6, thread   0, (affinity =   48)\nNode    0, rank    8, thread   0, (affinity =   64)\nNode    0, rank   10, thread   0, (affinity =   80)\nNode    0, rank   12, thread   0, (affinity =   96)\nNode    0, rank   14, thread   0, (affinity =  112)\nNode    0, rank   16, thread   0, (affinity =    1)\nNode    0, rank   18, thread   0, (affinity =   17)\nNode    0, rank   20, thread   0, (affinity =   33)\nNode    0, rank   22, thread   0, (affinity =   49)\nNode    0, rank   24, thread   0, (affinity =   65)\nNode    0, rank   26, thread   0, (affinity =   81)\nNode    0, rank   28, thread   0, (affinity =   97)\nNode    0, rank   30, thread   0, (affinity =  113)\n\n...output trimmed...\n

There are other modes available with the MPICH_RANK_REORDER_METHOD environment variable, including one which lets the user provide a file called MPICH_RANK_ORDER which contains a list of each task's placement on each node. These options are described in detail in the intro_mpi man page.

"},{"location":"user-guide/scheduler/#grid_order","title":"grid_order","text":"

For MPI applications which perform a large amount of nearest-neighbor communication, e.g., stencil-based applications on structured grids, HPE provide a tool in the perftools-base module (Loaded by default for all users) called grid_order which can generate a MPICH_RANK_ORDER file automatically by taking as parameters the dimensions of the grid, core count, etc. For example, to place 256 MPI parameters in row-major order on a Cartesian grid of size $(8, 8, 4)$, using 128 cores per node:

grid_order -R -c 128 -g 8,8,4\n\n# grid_order -R -Z -c 128 -g 8,8,4\n# Region 3: 0,0,1 (0..255)\n0,1,2,3,32,33,34,35,64,65,66,67,96,97,98,99,128,129,130,131,160,161,162,163,192,193,194,195,224,225,226,227,4,5,6,7,36,37,38,39,68,69,70,71,100,101,102,103,132,133,134,135,164,165,166,167,196,197,198,199,228,229,230,231,8,9,10,11,40,41,42,43,72,73,74,75,104,105,106,107,136,137,138,139,168,169,170,171,200,201,202,203,232,233,234,235,12,13,14,15,44,45,46,47,76,77,78,79,108,109,110,111,140,141,142,143,172,173,174,175,204,205,206,207,236,237,238,239\n16,17,18,19,48,49,50,51,80,81,82,83,112,113,114,115,144,145,146,147,176,177,178,179,208,209,210,211,240,241,242,243,20,21,22,23,52,53,54,55,84,85,86,87,116,117,118,119,148,149,150,151,180,181,182,183,212,213,214,215,244,245,246,247,24,25,26,27,56,57,58,59,88,89,90,91,120,121,122,123,152,153,154,155,184,185,186,187,216,217,218,219,248,249,250,251,28,29,30,31,60,61,62,63,92,93,94,95,124,125,126,127,156,157,158,159,188,189,190,191,220,221,222,223,252,253,254,255\n

One can then save this output to a file called MPICH_RANK_ORDER and then set MPICH_RANK_REORDER_METHOD=3 before running the job, which tells Cray MPI to read the MPICH_RANK_ORDER file to set the MPI task placement. For more information, please see the man page man grid_order.

"},{"location":"user-guide/scheduler/#interactive-jobs","title":"Interactive Jobs","text":""},{"location":"user-guide/scheduler/#using-salloc-to-reserve-resources","title":"Using salloc to reserve resources","text":"

When you are developing or debugging code you often want to run many short jobs with a small amount of editing the code between runs. This can be achieved by using the login nodes to run MPI but you may want to test on the compute nodes (e.g. you may want to test running on multiple nodes across the high performance interconnect). One of the best ways to achieve this on ARCHER2 is to use interactive jobs.

An interactive job allows you to issue srun commands directly from the command line without using a job submission script, and to see the output from your program directly in the terminal.

You use the salloc command to reserve compute nodes for interactive jobs.

To submit a request for an interactive job reserving 8 nodes (1024 physical cores) for 20 minutes on the short QoS you would issue the following command from the command line:

auser@ln01:> salloc --nodes=8 --ntasks-per-node=128 --cpus-per-task=1 \\\n                --time=00:20:00 --partition=standard --qos=short \\\n                --account=[budget code]\n

When you submit this job your terminal will display something like:

salloc: Granted job allocation 24236\nsalloc: Waiting for resource configuration\nsalloc: Nodes nid000002 are ready for job\nauser@ln01:>\n

It may take some time for your interactive job to start. Once it runs you will enter a standard interactive terminal session (a new shell). Note that this shell is still on the front end (the prompt has not change). Whilst the interactive session lasts you will be able to run parallel jobs on the compute nodes by issuing the srun --distribution=block:block --hint=nomultithread command directly at your command prompt using the same syntax as you would inside a job script. The maximum number of nodes you can use is limited by resources requested in the salloc command.

Important

If you wish the cpus-per-task option to salloc to propagate to srun commands in the allocation, you will need to use the command export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK before you issue any srun commands.

If you know you will be doing a lot of intensive debugging you may find it useful to request an interactive session lasting the expected length of your working session, say a full day.

Your session will end when you hit the requested walltime. If you wish to finish before this you should use the exit command - this will return you to your prompt before you issued the salloc command.

"},{"location":"user-guide/scheduler/#using-srun-directly","title":"Using srun directly","text":"

A second way to run an interactive job is to use srun directly in the following way (here using the short QoS):

auser@ln01:/work/t01/t01/auser> srun --nodes=1 --exclusive --time=00:20:00 \\\n                --partition=standard --qos=short --account=[budget code] \\\n    --pty /bin/bash\nauser@nid001261:/work/t01/t01/auser> hostname\nnid001261\n

The --pty /bin/bash will cause a new shell to be started on the first node of a new allocation . This is perhaps closer to what many people consider an 'interactive' job than the method using salloc appears.

One can now issue shell commands in the usual way. A further invocation of srun is required to launch a parallel job in the allocation.

Note

When using srun within an interactive srun session, you will need to include both the --overlap and --oversubscribe flags, and specify the number of cores you want to use:

auser@nid001261:/work/t01/t01/auser> srun --overlap --oversubscribe --distribution=block:block \\\n                --hint=nomultithread --ntasks=128 ./my_mpi_executable.x\n

Without --overlap the second srun will block until the first one has completed. Since your interactive session was launched with srun this means it will never actually start -- you will get repeated warnings that \"Requested nodes are busy\".

When finished, type exit to relinquish the allocation and control will be returned to the front end.

By default, the interactive shell will retain the environment of the parent. If you want a clean shell, remember to specify --export=none.

"},{"location":"user-guide/scheduler/#heterogeneous-jobs","title":"Heterogeneous jobs","text":"

Most of the Slurm submissions discussed above involve running a single executable. However, there are situations where two or more distinct executables are coupled and need to be run at the same time, potentially using the same MPI communicator. This is most easily handled via the Slurm heterogeneous job mechanism.

Two common cases are discussed below: first, a client server model in which client and server each have a different MPI_COMM_WORLD, and second the case were two or more executables share MPI_COMM_WORLD.

"},{"location":"user-guide/scheduler/#heterogeneous-jobs-for-a-clientserver-model-distinct-mpi_comm_worlds","title":"Heterogeneous jobs for a client/server model: distinct MPI_COMM_WORLDs","text":"

The essential feature of a heterogeneous job here is to create a single batch submission which specifies the resource requirements for the individual components. Schematically, we would use

#!/bin/bash\n\n# Slurm specifications for the first component\n\n#SBATCH --partition=standard\n\n...\n\n#SBATCH hetjob\n\n# Slurm specifications for the second component\n\n#SBATCH --partition=standard\n\n...\n
where new each component beyond the first is introduced by the special token #SBATCH hetjob (note this is not a normal option and is not --hetjob). Each component must specify a partition.

Such a job will appear in the scheduler as, e.g.,

           50098+0  standard qscript-    user  PD       0:00      1 (None)\n           50098+1  standard qscript-    user  PD       0:00      2 (None)\n
and counts as (in this case) two separate jobs from the point of QoS limits.

Consider a case where we have two executables which may both be parallel (in that they use MPI), both run at the same time, and communicate with each other using MPI or by some other means. In the following example, we run two different executables, xthi-a and xthi-b, both of which must finish before the jobs completes.

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --export=none\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=8\n\n#SBATCH hetjob\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=4\n\n# Run two executables with separate MPI_COMM_WORLD\n\nsrun --distribution=block:block --hint=nomultithread --het-group=0 ./xthi-a &\nsrun --distribution=block:block --hint=nomultithread --het-group=1 ./xthi-b &\nwait\n
In this case, each executable is launched with a separate call to srun but specifies a different heterogeneous group via the --het-group option. The first group is --het-group=0. Both are run in the background with & and the wait is required to ensure both executables have completed before the job submission exits.

The above is a rather artificial example using two executables which are in fact just symbolic links in the job directory to xthi, used without loading the module. You can test this script yourself by creating symbolic links to the original executable before submitting the job:

auser@ln04:/work/t01/t01/auser/job-dir> module load xthi\nauser@ln04:/work/t01/t01/auser/job-dir> which xthi\n/work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi\nauser@ln04:/work/t01/t01/auser/job-dir> ln -s /work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi xthi-a\nauser@ln04:/work/t01/t01/auser/job-dir> ln -s /work/y07/shared/utils/core/xthi/1.2/CRAYCLANG/11.0/bin/xthi xthi-b\n

The example job will produce two reports showing the placement of the MPI tasks from the two instances of xthi running in each of the heterogeneous groups. For example, the output might be

Node summary for    1 nodes:\nNode    0, hostname nid002400, mpi   8, omp   1, executable xthi-a\nMPI summary: 8 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode summary for    2 nodes:\nNode    0, hostname nid002146, mpi   4, omp   1, executable xthi-b\nNode    1, hostname nid002149, mpi   4, omp   1, executable xthi-b\nMPI summary: 8 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    1, rank    4, thread   0, (affinity =    0)\nNode    1, rank    5, thread   0, (affinity =    1)\nNode    1, rank    6, thread   0, (affinity =    2)\nNode    1, rank    7, thread   0, (affinity =    3)\n
Here we have the first executable running on one node with a communicator size 8 (ranks 0-7). The second executable runs on two nodes also with communicator size 8 (ranks 0-7, 4 ranks per node). Further examples of placement for heterogenenous jobs are given below.

Finally, if your workflow requires the different heterogeneous jobs to communicate via MPI, but without sharing their MPI_COM_WORLD, you will need to export two new variables before your srun commands as defined below:

export PMI_UNIVERSE_SIZE=3\nexport MPICH_SINGLE_HOST_ENABLED=0\n
"},{"location":"user-guide/scheduler/#heterogeneous-jobs-for-a-shared-mpi_com_world","title":"Heterogeneous jobs for a shared MPI_COM_WORLD","text":"

Note

The directive SBATCH hetjob can no longer be used for jobs requiring a shared MPI_COMM_WORLD

Note

In this approach, each hetjob component must be on its own set of nodes. You cannot use this approach to place different hetjob components on the same node.

If two or more heterogeneous components need to share a unique MPI_COMM_WORLD, a single srun invocation with the different components separated by a colon : should be used. Arguments to the individual components of the srun control the placement of the tasks and threads for each component. For example, running the same xthi-a and xthi-b executables as above but now in a shared communicator, we might run:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --export=none\n#SBATCH --account=[...]\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# We must specify correctly the total number of nodes required.\n#SBATCH --nodes=3\n\nSHARED_ARGS=\"--distribution=block:block --hint=nomultithread\"\n\nsrun --het-group=0 --nodes=1 --ntasks-per-node=8 ${SHARED_ARGS} ./xthi-a : \\\n --het-group=1 --nodes=2 --ntasks-per-node=4 ${SHARED_ARGS} ./xthi-b\n

The output should confirm we have a single MPI_COMM_WORLD with a total of three nodes, xthi-a running on one and xthi-b on two, with ranks 0-15 extending across both executables.

Node summary for    3 nodes:\nNode    0, hostname nid002668, mpi   8, omp   1, executable xthi-a\nNode    1, hostname nid002669, mpi   4, omp   1, executable xthi-b\nNode    2, hostname nid002670, mpi   4, omp   1, executable xthi-b\nMPI summary: 16 ranks\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    1, thread   0, (affinity =    1)\nNode    0, rank    2, thread   0, (affinity =    2)\nNode    0, rank    3, thread   0, (affinity =    3)\nNode    0, rank    4, thread   0, (affinity =    4)\nNode    0, rank    5, thread   0, (affinity =    5)\nNode    0, rank    6, thread   0, (affinity =    6)\nNode    0, rank    7, thread   0, (affinity =    7)\nNode    1, rank    8, thread   0, (affinity =    0)\nNode    1, rank    9, thread   0, (affinity =    1)\nNode    1, rank   10, thread   0, (affinity =    2)\nNode    1, rank   11, thread   0, (affinity =    3)\nNode    2, rank   12, thread   0, (affinity =    0)\nNode    2, rank   13, thread   0, (affinity =    1)\nNode    2, rank   14, thread   0, (affinity =    2)\nNode    2, rank   15, thread   0, (affinity =    3)\n
"},{"location":"user-guide/scheduler/#heterogeneous-placement-for-mixed-mpiopenmp-work","title":"Heterogeneous placement for mixed MPI/OpenMP work","text":"

Some care may be required for placement of tasks/threads in heterogeneous jobs in which the number of threads needs to be specified differently for different components.

In the following we have two components, again using xthi-a and xthi-b as our two separate executables. The first component runs 8 MPI tasks each with 16 OpenMP threads on one node. The second component runs 8 MPI tasks with one task per NUMA region on a second node; each task has one thread. An appropriate Slurm submission might be:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n#SBATCH --export=none\n#SBATCH --account=[...]\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n#SBATCH --nodes=2\n\nSHARED_ARGS=\"--distribution=block:block --hint=nomultithread \\\n              --nodes=1 --ntasks-per-node=8 --cpus-per-task=16\"\n\n# Do not set OMP_NUM_THREADS in the calling environment\n\nunset OMP_NUM_THREADS\nexport OMP_PROC_BIND=spread\n\nsrun --het-group=0 ${SHARED_ARGS} --export=all,OMP_NUM_THREADS=16 ./xthi-a : \\\n      --het-group=1 ${SHARED_ARGS} --export=all,OMP_NUM_THREADS=1  ./xthi-b\n

The important point here is that OMP_NUM_THREADS must not be set in the environment that calls srun in order that the different specifications for the separate groups via --export on the srun command line take effect. If OMP_NUM_THREADS is set in the calling environment, then that value takes precedence, and each component will see the same value of OMP_NUM_THREADS.

The output might then be:

Node    0, hostname nid001111, mpi   8, omp  16, executable xthi-a\nNode    1, hostname nid001126, mpi   8, omp   1, executable xthi-b\nNode    0, rank    0, thread   0, (affinity =    0)\nNode    0, rank    0, thread   1, (affinity =    1)\nNode    0, rank    0, thread   2, (affinity =    2)\nNode    0, rank    0, thread   3, (affinity =    3)\nNode    0, rank    0, thread   4, (affinity =    4)\nNode    0, rank    0, thread   5, (affinity =    5)\nNode    0, rank    0, thread   6, (affinity =    6)\nNode    0, rank    0, thread   7, (affinity =    7)\nNode    0, rank    0, thread   8, (affinity =    8)\nNode    0, rank    0, thread   9, (affinity =    9)\nNode    0, rank    0, thread  10, (affinity =   10)\nNode    0, rank    0, thread  11, (affinity =   11)\nNode    0, rank    0, thread  12, (affinity =   12)\nNode    0, rank    0, thread  13, (affinity =   13)\nNode    0, rank    0, thread  14, (affinity =   14)\nNode    0, rank    0, thread  15, (affinity =   15)\nNode    0, rank    1, thread   0, (affinity =   16)\nNode    0, rank    1, thread   1, (affinity =   17)\n...\nNode    0, rank    7, thread  14, (affinity =  126)\nNode    0, rank    7, thread  15, (affinity =  127)\nNode    1, rank    8, thread   0, (affinity =    0)\nNode    1, rank    9, thread   0, (affinity =   16)\nNode    1, rank   10, thread   0, (affinity =   32)\nNode    1, rank   11, thread   0, (affinity =   48)\nNode    1, rank   12, thread   0, (affinity =   64)\nNode    1, rank   13, thread   0, (affinity =   80)\nNode    1, rank   14, thread   0, (affinity =   96)\nNode    1, rank   15, thread   0, (affinity =  112)\n

Here we can see the eight MPI tasks from xthi-a each running with sixteen OpenMP threads. Then the 8 MPI tasks with no threading from xthi-b are spaced across the cores on the second node, one per NUMA region.

"},{"location":"user-guide/scheduler/#low-priority-access","title":"Low priority access","text":"

Low priority jobs are not charged against your allocation but will only run when other, higher-priority, jobs cannot be run. Although low priority jobs are not charged, you do need a valid, positive budget to be able to submit and run low priority jobs, i.e. you need at least 1 CU in your budget.

Low priority access is always available and has the following limits:

You submit a low priority job on ARCHER2 by using the lowpriority QoS. For example, you would usually have the following line in your job submission script sbatch options:

#SBATCH --qos=lowpriority\n
"},{"location":"user-guide/scheduler/#reservations","title":"Reservations","text":"

Reservations are available on ARCHER2. These allow users to reserve a number of nodes for a specified length of time starting at a particular time on the system.

Reservations require justification. They will only be approved if the request could not be fulfilled with the normal QoS's. For instance, you require a job/jobs to run at a particular time e.g. for a demonstration or course.

Note

Reservation requests must be submitted at least 60 hours in advance of the reservation start time. If requesting a reservation for a Monday at 18:00, please ensure this is received by the Friday at 12:00 the latest. The same applies over Service Holidays.

Note

Reservations are only valid for standard compute nodes, high memory compute nodes and/or PP nodes cannot be included in reservations.

Reservations will be charged at 1.5 times the usual CU rate and our policy is that they will be charged the full rate for the entire reservation at the time of booking, whether or not you use the nodes for the full time. In addition, you will not be refunded the CUs if you fail to use them due to a job issue unless this issue is due to a system failure.

To request a reservation you complete a form on SAFE:

  1. Log into SAFE
  2. Under the \"Login accounts\" menu, choose the \"Request reservation\" option

On the first page, you need to provide the following:

On the second page, you will need to specify which username you wish the reservation to be charged against and, once the username has been selected, the budget you want to charge the reservation to. (The selected username will be charged for the reservation but the reservation can be used by all members of the selected budget.)

Your request will be checked by the ARCHER2 User Administration team and, if approved, you will be provided a reservation ID which can be used on the system. To submit jobs to a reservation, you need to add --reservation=<reservation ID> and --qos=reservation options to your job submission script or command.

Important

You must have at least 1 CU in the budget to submit a job on ARCHER2, even to a pre-paid reservation.

Tip

You can submit jobs to a reservation as soon as the reservation has been set up; jobs will remain queued until the reservation starts.

"},{"location":"user-guide/scheduler/#capability-days","title":"Capability Days","text":"

Important

The next Capability Days session will be from Tue 24 Sep 2024 to Thu 26 Sep 2024.

ARCHER2 Capability Days are a mechanism to allow users to run large scale (512 node or more) tests on the system free of charge. The motivations behind Capability Days are:

To enable this, a period will be made available regularly where users can run jobs at large scale free of charge.

Capability Days are made up of different parts:

Tip

Any jobs left in the queues when Capability Days finish will be deleted.

"},{"location":"user-guide/scheduler/#pre-capability-day-session","title":"pre-Capability Day session","text":"

The pre-Capability Day session is typically available directly before the full Capability Day session and allows short test jobs to prepare for Capability Day.

Submit to the pre-capabilityday QoS. Jobs can be submitted ahead of time and will start when the pre-Capability Day session starts.

pre-capabilityday QoS limits:

"},{"location":"user-guide/scheduler/#example-pre-capability-day-session-job-submission-script","title":"Example pre-Capability Day session job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=test_capability_job\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --qos=pre-capabilityday\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#nerc-capability-reservation","title":"NERC Capability reservation","text":"

The NERC Capability reservation is typically available directly before the full Capability Day session and allows short test jobs to prepare for Capability Day.

Submit to the NERCcapability reservation. Jobs can be submitted ahead of time and will start when the NERC Capability reservatoin starts.

NERCcapability reservation limits:

"},{"location":"user-guide/scheduler/#example-nerc-capability-reservation-job-submission-script","title":"Example NERC Capability reservation job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=NERC_capability_job\n#SBATCH --nodes=256\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --reservation=NERCcapability\n#SBATCH --qos=reservation\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#capability-day-session","title":"Capability Day session","text":"

The Capability Day session is typically available directly after the pre-Capability Day session.

Submit to the capability QoS. Jobs can be submitted ahead of time and will start when the Capability Day session starts.

capabilityday QoS limits:

"},{"location":"user-guide/scheduler/#example-capability-day-job-submission-script","title":"Example Capability Day job submission script","text":"
#!/bin/bash\n#SBATCH --job-name=capability_job\n#SBATCH --nodes=1024\n#SBATCH --ntasks-per-node=8\n#SBATCH --cpus-per-task=16\n#SBATCH --time=1:0:0\n#SBATCH --partition=standard\n#SBATCH --qos=capabilityday\n#SBATCH --account=t01\n\nexport OMP_NUM_THREADS=16\nexport OMP_PLACES=cores\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\n\n# Check process/thread placement\nmodule load xthi\nsrun --hint=multithread --distribution=block:block xthi > placement-${SLURM_JOBID}.out\n\nsrun --hint=multithread --distribution=block:block my_app.x\n
"},{"location":"user-guide/scheduler/#capability-day-tips","title":"Capability Day tips","text":""},{"location":"user-guide/scheduler/#serial-jobs","title":"Serial jobs","text":"

You can run serial jobs on the shared data analysis nodes. More information on using the data analysis nodes (including example job submission scripts) can be found in the Data Analysis section of the User and Best Practice Guide.

"},{"location":"user-guide/scheduler/#gpu-jobs","title":"GPU jobs","text":"

You can run on the ARCHER2 GPU nodes and full guidance can be found on the GPU development platform page

"},{"location":"user-guide/scheduler/#best-practices-for-job-submission","title":"Best practices for job submission","text":"

This guidance is adapted from the advice provided by NERSC

"},{"location":"user-guide/scheduler/#time-limits","title":"Time Limits","text":"

Due to backfill scheduling, short and variable-length jobs generally start quickly resulting in much better job throughput. You can specify a minimum time for your job with the --time-min option to SBATCH:

#SBATCH --time-min=<lower_bound>\n#SBATCH --time=<upper_bound>\n

Within your job script, you can get the time remaining in the job with squeue -h -j ${Slurm_JOBID} -o %L to allow you to deal with potentially varying runtimes when using this option.

"},{"location":"user-guide/scheduler/#long-running-jobs","title":"Long Running Jobs","text":"

Simulations which must run for a long period of time achieve the best throughput when composed of many small jobs using a checkpoint and restart method chained together (see above for how to chain jobs together). However, this method does occur a startup and shutdown overhead for each job as the state is saved and loaded so you should experiment to find the best balance between runtime (long runtimes minimise the checkpoint/restart overheads) and throughput (short runtimes maximise throughput).

"},{"location":"user-guide/scheduler/#interconnect-locality","title":"Interconnect locality","text":"

For jobs which are sensitive to interconnect (MPI) performance and utilise 128 nodes or less it is possible to request that all nodes are in a single Slingshot dragonfly group. The maximum number of nodes in a group on ARCHER2 is 128.

Slurm has a concept of \"switches\" which on ARCHER2 are configured to map to Slingshot electrical groups; where all compute nodes have all-to-all electrical connections which minimises latency. Since this places an additional constraint on the scheduler a maximum time to wait for the requested topology can be specified - after this time, the job will be placed without the constraint.

For example, to specify that all requested nodes should come from one electrical group and to wait for up to 6 hours (360 minutes) for that placement, you would use the following option in your job:

#SBATCH --switches=1@360\n

You can request multiple groups using this option if you are using more nodes than are in a single group to maximise the number of nodes that share electrical connetions in the job. For example, to request 4 groups (maximum of 512 nodes) and have this as an absolute constraint with no timeout, you would use:

#SBATCH --switches=4\n

Danger

When specifying the number of groups take care to request enough groups to satisfy the requested number of nodes. If the number is too low then an unnecessary delay will be added due to the unsatisfiable request.

A useful heuristic to ensure this is the case is to ensure that the total nodes requested is less than or equal to the number of groups multiplied by 128.

"},{"location":"user-guide/scheduler/#large-jobs","title":"Large Jobs","text":"

Large jobs may take longer to start up. The sbcast command is recommended for large jobs requesting over 1500 MPI tasks. By default, Slurm reads the executable on the allocated compute nodes from the location where it is installed; this may take long time when the file system (where the executable resides) is slow or busy. The sbcast command, the executable can be copied to the /tmp directory on each of the compute nodes. Since /tmp is part of the memory on the compute nodes, it can speed up the job startup time.

sbcast --compress=none /path/to/exe /tmp/exe\nsrun /tmp/exe\n
"},{"location":"user-guide/scheduler/#huge-pages","title":"Huge pages","text":"

Huge pages are virtual memory pages which are bigger than the default page size of 4K bytes. Huge pages can improve memory performance for common access patterns on large data sets since it helps to reduce the number of virtual to physical address translations when compared to using the default 4KB.

To use huge pages for an application (with the 2 MB huge pages as an example):

module load craype-hugepages2M\ncc -o mycode.exe mycode.c\n

And also load the same huge pages module at runtime.

Warning

Due to the huge pages memory fragmentation issue, applications may get Cannot allocate memory warnings or errors when there are not enough hugepages on the compute node, such as:

libhugetlbfs [nid0000xx:xxxxx]: WARNING: New heap segment map at 0x10000000 failed: Cannot allocate memory``

By default, The verbosity level of libhugetlbfs HUGETLB_VERBOSE is set to 0 on ARCHER2 to suppress debugging messages. Users can adjust this value to obtain more information on huge pages use.

"},{"location":"user-guide/scheduler/#when-to-use-huge-pages","title":"When to Use Huge Pages","text":""},{"location":"user-guide/scheduler/#when-to-avoid-huge-pages","title":"When to Avoid Huge Pages","text":""},{"location":"user-guide/sw-environment-4cab/","title":"Software environment: 4-cabinet system","text":"

Important

This section covers the software environment on the initial, 4-cabinet ARCHER2 system. For docmentation on the software environment on the full ARCHER2 system, please see Software environment: full system.

The software environment on ARCHER2 is primarily controlled through the module command. By loading and switching software modules you control which software and versions are available to you.

Information

A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.

By default, all users on ARCHER2 start with the default software environment loaded.

Software modules on ARCHER2 are provided by both HPE Cray (usually known as the Cray Development Environment, CDE) and by EPCC, who provide the Service Provision, and Computational Science and Engineering services.

In this section, we provide:

"},{"location":"user-guide/sw-environment-4cab/#using-the-module-command","title":"Using the module command","text":"

We only cover basic usage of the module command here. For full documentation please see the Linux manual page on modules

The module command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:

These are described in more detail below.

"},{"location":"user-guide/sw-environment-4cab/#information-on-the-available-modules","title":"Information on the available modules","text":"

The module list command will give the names of the modules and their versions you have presently loaded in your environment:

auser@uan01:~> module list\nCurrently Loaded Modulefiles:\n1) cpe-aocc                          7) cray-dsmml/0.1.2(default)\n2) aocc/2.1.0.3(default)             8) perftools-base/20.09.0(default)\n3) craype/2.7.0(default)             9) xpmem/2.2.35-7.0.1.0_1.3__gd50fabf.shasta(default)\n4) craype-x86-rome                  10) cray-mpich/8.0.15(default)\n5) libfabric/1.11.0.0.233(default)  11) cray-libsci/20.08.1.2(default)\n6) craype-network-ofi\n

Finding out which software modules are available on the system is performed using the module avail command. To list all software modules available, use:

auser@uan01:~> module avail\n------------------------------- /opt/cray/pe/perftools/20.09.0/modulefiles --------------------------------\nperftools       perftools-lite-events  perftools-lite-hbm    perftools-nwpc     \nperftools-lite  perftools-lite-gpu     perftools-lite-loops  perftools-preload  \n\n---------------------------------- /opt/cray/pe/craype/2.7.0/modulefiles ----------------------------------\ncraype-hugepages1G  craype-hugepages8M   craype-hugepages128M  craype-network-ofi          \ncraype-hugepages2G  craype-hugepages16M  craype-hugepages256M  craype-network-slingshot10  \ncraype-hugepages2M  craype-hugepages32M  craype-hugepages512M  craype-x86-rome             \ncraype-hugepages4M  craype-hugepages64M  craype-network-none   \n\n------------------------------------- /usr/local/Modules/modulefiles --------------------------------------\ndot  module-git  module-info  modules  null  use.own  \n\n-------------------------------------- /opt/cray/pe/cpe-prgenv/7.0.0 --------------------------------------\ncpe-aocc  cpe-cray  cpe-gnu  \n\n-------------------------------------------- /opt/modulefiles ---------------------------------------------\naocc/2.1.0.3(default)  cray-R/4.0.2.0(default)  gcc/8.1.0  gcc/9.3.0  gcc/10.1.0(default)  \n\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\natp/3.7.4(default)              cray-mpich-abi/8.0.15             craype-dl-plugin-py3/20.06.1(default)  \ncce/10.0.3(default)             cray-mpich-ucx/8.0.15             craype/2.7.0(default)                  \ncray-ccdb/4.7.1(default)        cray-mpich/8.0.15(default)        craypkg-gen/1.3.10(default)            \ncray-cti/2.7.3(default)         cray-netcdf-hdf5parallel/4.7.4.0  gdb4hpc/4.7.3(default)                 \ncray-dsmml/0.1.2(default)       cray-netcdf/4.7.4.0               iobuf/2.0.10(default)                  \ncray-fftw/3.3.8.7(default)      cray-openshmemx/11.1.1(default)   papi/6.0.0.2(default)                  \ncray-ga/5.7.0.3                 cray-parallel-netcdf/1.12.1.0     perftools-base/20.09.0(default)        \ncray-hdf5-parallel/1.12.0.0     cray-pmi-lib/6.0.6(default)       valgrind4hpc/2.7.2(default)            \ncray-hdf5/1.12.0.0              cray-pmi/6.0.6(default)           \ncray-libsci/20.08.1.2(default)  cray-python/3.8.5.0(default)    \n

This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the HPE Cray FFTW library, use:

auser@uan01:~> module avail cray-fftw\n\n---------------------------------------- /opt/cray/pe/modulefiles -----------------------------------------\ncray-fftw/3.3.8.7(default) \n

If you want more info on any of the modules, you can use the module help command:

auser@uan01:~> module help cray-fftw\n\n-------------------------------------------------------------------\nModule Specific Help for /opt/cray/pe/modulefiles/cray-fftw/3.3.8.7:\n\n\n===================================================================\nFFTW 3.3.8.7\n============\n  Release Date:\n  -------------\n    June 2020\n\n\n  Purpose:\n  --------\n    This Cray FFTW 3.3.8.7 release is supported on Cray Shasta Systems. \n    FFTW is supported on the host CPU but not on the accelerator of Cray systems.\n\n    The Cray FFTW 3.3.8.7 release provides the following:\n      - Optimizations for AMD Rome CPUs.\n    See the Product and OS Dependencies section for details\n\n[...]\n

The module show command reveals what operations the module actually performs to change your environment when it is loaded. We provide a brief overview of what the significance of these different settings mean below. For example, for the default FFTW module:

auser@uan01:~> module show cray-fftw\n-------------------------------------------------------------------\n/opt/cray/pe/modulefiles/cray-fftw/3.3.8.7:\n\nconflict        cray-fftw\nconflict        fftw\nsetenv          FFTW_VERSION 3.3.8.7\nsetenv          CRAY_FFTW_VERSION 3.3.8.7\nsetenv          CRAY_FFTW_PREFIX /opt/cray/pe/fftw/3.3.8.7/x86_rome\nsetenv          FFTW_ROOT /opt/cray/pe/fftw/3.3.8.7/x86_rome\nsetenv          FFTW_DIR /opt/cray/pe/fftw/3.3.8.7/x86_rome/lib\nsetenv          FFTW_INC /opt/cray/pe/fftw/3.3.8.7/x86_rome/include\nprepend-path    PATH /opt/cray/pe/fftw/3.3.8.7/x86_rome/bin\nprepend-path    MANPATH /opt/cray/pe/fftw/3.3.8.7/share/man\nprepend-path    CRAY_LD_LIBRARY_PATH /opt/cray/pe/fftw/3.3.8.7/x86_rome/lib\nprepend-path    PE_PKGCONFIG_PRODUCTS PE_FFTW\nsetenv          PE_FFTW_TARGET_x86_skylake x86_skylake\nsetenv          PE_FFTW_TARGET_x86_rome x86_rome\nsetenv          PE_FFTW_TARGET_x86_cascadelake x86_cascadelake\nsetenv          PE_FFTW_TARGET_x86_64 x86_64\nsetenv          PE_FFTW_TARGET_share share\nsetenv          PE_FFTW_TARGET_sandybridge sandybridge\nsetenv          PE_FFTW_TARGET_mic_knl mic_knl\nsetenv          PE_FFTW_TARGET_ivybridge ivybridge\nsetenv          PE_FFTW_TARGET_haswell haswell\nsetenv          PE_FFTW_TARGET_broadwell broadwell\nsetenv          PE_FFTW_VOLATILE_PKGCONFIG_PATH /opt/cray/pe/fftw/3.3.8.7/@PE_FFTW_TARGET@/lib/pkgconfig\nsetenv          PE_FFTW_PKGCONFIG_VARIABLES PE_FFTW_OMP_REQUIRES_@openmp@\nsetenv          PE_FFTW_OMP_REQUIRES { }\nsetenv          PE_FFTW_OMP_REQUIRES_openmp _mp\nsetenv          PE_FFTW_PKGCONFIG_LIBS fftw3_mpi:libfftw3_threads:fftw3:fftw3f_mpi:libfftw3f_threads:fftw3f\nmodule-whatis   {FFTW 3.3.8.7 - Fastest Fourier Transform in the West}\n  [...]\n
"},{"location":"user-guide/sw-environment-4cab/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"

To load a module to use the module load command. For example, to load the default version of HPE Cray FFTW into your environment, use:

auser@uan01:~> module load cray-fftw\n

Once you have done this, your environment will be setup to use the HPE Cray FFTW library. The above command will load the default version of HPE Cray FFTW. If you need a specific version of the software, you can add more information:

auser@uan01:~> module load cray-fftw/3.3.8.7\n

will load HPE Cray FFTW version 3.3.8.7 into your environment, regardless of the default.

If you want to remove software from your environment, module remove will remove a loaded module:

auser@uan01:~> module remove cray-fftw\n

will unload what ever version of cray-fftw (even if it is not the default) you might have loaded.

There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using module swap oldmodule newmodule.

Suppose you have loaded version 3.3.8.7 of cray-fftw, the following command will change to version 3.3.8.5:

auser@uan01:~> module swap cray-fftw cray-fftw/3.3.8.5\n

You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.

"},{"location":"user-guide/sw-environment-4cab/#changing-programming-environment","title":"Changing Programming Environment","text":"

The three programming environments PrgEnv-aocc, PrgEnv-cray, PrgEnv-gnu are implemented as module collections. The correct way to change programming environment, that is, change the collection of modules, is therefore via module restore. For example:

auser@uan01:~> module restore PrgEnv-gnu\n

!!! note there is only one argument, which is the collection to be restored. The command module restore will output a list of modules in the outgoing collection as they are unloaded, and the modules in the incoming collection as they are loaded. If you prefer not to have messages

auser@uan1:~> module -s restore PrgEnv-gnu\n

will suppress the messages. An attempt to restore a collection which is already loaded will result in no operation.

Module collections are stored in a user's home directory ${HOME}/.module. However, as the home directory is not available to the back end, module restore may fail for batch jobs. In this case, it is possible to restore one of the three standard programming environments via, e.g.,

module restore /etc/cray-pe.d/PrgEnv-gnu\n
"},{"location":"user-guide/sw-environment-4cab/#capturing-your-environment-for-reuse","title":"Capturing your environment for reuse","text":"

Sometimes it is useful to save the module environment that you are using to compile a piece of code or execute a piece of software. This is saved as a module collection. You can save a collection from your current environment by executing:

auser@uan01:~> module save [collection_name]\n

Note

If you do not specify the environment name, it is called default.

You can find the list of saved module environments by executing:

auser@uan01:~> module savelist\nNamed collection list:\n 1) default   2) PrgEnv-aocc   3) PrgEnv-cray   4) PrgEnv-gnu \n

To list the modules in a collection, you can execute, e.g.,:

auser@uan01:~> module saveshow PrgEnv-gnu\n-------------------------------------------------------------------\n/home/t01/t01/auser/.module/default:\nmodule use --append /opt/cray/pe/perftools/20.09.0/modulefiles\nmodule use --append /opt/cray/pe/craype/2.7.0/modulefiles\nmodule use --append /usr/local/Modules/modulefiles\nmodule use --append /opt/cray/pe/cpe-prgenv/7.0.0\nmodule use --append /opt/modulefiles\nmodule use --append /opt/cray/modulefiles\nmodule use --append /opt/cray/pe/modulefiles\nmodule use --append /opt/cray/pe/craype-targets/default/modulefiles\nmodule load cpe-gnu\nmodule load gcc\nmodule load craype\nmodule load craype-x86-rome\nmodule load --notuasked libfabric\nmodule load craype-network-ofi\nmodule load cray-dsmml\nmodule load perftools-base\nmodule load xpmem\nmodule load cray-mpich\nmodule load cray-libsci\nmodule load /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env\n

Note again that the details of the collection have been saved to the home directory (the first line of output above). It is possible to save a module collection with a fully qualified path, e.g.,

auser@uan1:~> module save /work/t01/z01/auser/.module/PrgEnv-gnu\n

which would make it available from the batch system.

To delete a module environment, you can execute:

auser@uan01:~> module saverm <environment_name>\n
"},{"location":"user-guide/sw-environment-4cab/#shell-environment-overview","title":"Shell environment overview","text":"

When you log in to ARCHER2, you are using the bash shell by default. As any other software, the bash shell has loaded a set of environment variables that can be listed by executing printenv or export.

The environment variables listed before are useful to define the behaviour of the software you run. For instance, OMP_NUM_THREADS define the number of threads.

To define an environment variable, you need to execute:

export OMP_NUM_THREADS=4\n

Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.

You can show the value of a specific environment variable if you print it:

echo $OMP_NUM_THREADS\n

Do not forget the dollar symbol. To remove an environment variable, just execute:

unset OMP_NUM_THREADS\n
"},{"location":"user-guide/sw-environment/","title":"Software environment","text":"

The software environment on ARCHER2 is managed using the Lmod software. Selecting which software is available in your environment is primarily controlled through the module command. By loading and switching software modules you control which software and versions are available to you.

Information

A module is a self-contained description of a software package -- it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.

By default, all users on ARCHER2 start with the default software environment loaded.

Software modules on ARCHER2 are provided by both HPE (usually known as the HPE Cray Programming Environment, CPE) and by EPCC, who provide the Service Provision, and Computational Science and Engineering services.

In this section, we provide:

"},{"location":"user-guide/sw-environment/#using-the-module-command","title":"Using the module command","text":"

We only cover basic usage of the Lmod module command here. For full documentation please see the Lmod documentation

The module command takes a subcommand to indicate what operation you wish to perform. Common subcommands are:

These are described in more detail below.

Tip

Lmod allows you to use the ml shortcut command. Without any arguments, ml behaves like module list; when a module name is specified to ml, ml behaves like module load.

Note

You will often have to include module commands in any job submission scripts to setup the software to use in your jobs. Generally, if you load modules in interactive sessions, these loaded modules do not carry over into any job submission scripts.

Important

You should not use the module purge command on ARCHER2 as this will cause issues for the HPE Cray programming environment. If you wish to reset your modules, you should use the module restore command instead.

"},{"location":"user-guide/sw-environment/#information-on-the-available-modules","title":"Information on the available modules","text":"

The key commands for getting information on modules are covered in more detail below. They are:

"},{"location":"user-guide/sw-environment/#module-list","title":"module list","text":"

The module list command will give the names of the modules and their versions you have presently loaded in your environment:

auser@ln03:~> module list\n\nCurrently Loaded Modules:\n  1) craype-x86-rome                         6) cce/15.0.0             11) PrgEnv-cray/8.3.3\n  2) libfabric/1.12.1.2.2.0.0                7) craype/2.7.19          12) bolt/0.8\n  3) craype-network-ofi                      8) cray-dsmml/0.2.2       13) epcc-setup-env\n  4) perftools-base/22.12.0                  9) cray-mpich/8.1.23      14) load-epcc-module\n  5) xpmem/2.5.2-2.4_3.30__gd0f7936.shasta  10) cray-libsci/22.12.1.1\n

All users start with a default set of modules loaded corresponding to:

"},{"location":"user-guide/sw-environment/#module-avail","title":"module avail","text":"

Finding out which software modules are currently available to load on the system is performed using the module avail command. To list all software modules currently available to load, use:

auser@uan01:~> module avail\n\n--------------------------- /work/y07/shared/archer2-lmod/utils/compiler/crayclang/10.0 ---------------------------\n   darshan/3.3.1\n\n------------------------------------ /work/y07/shared/archer2-lmod/python/core ------------------------------------\n   matplotlib/3.4.3    netcdf4/1.5.7    pytorch/1.10.0    scons/4.3.0    seaborn/0.11.2    tensorflow/2.7.0\n\n------------------------------------- /work/y07/shared/archer2-lmod/libs/core -------------------------------------\n   aocl/3.1     (D)    gmp/6.2.1            matio/1.5.23        parmetis/4.0.3        slepc/3.14.1\n   aocl/4.0            gsl/2.7              metis/5.1.0         petsc/3.14.2          slepc/3.18.3       (D)\n   boost/1.72.0        hypre/2.18.0         mkl/2023.0.0        petsc/3.18.5   (D)    superlu-dist/6.4.0\n   boost/1.81.0 (D)    hypre/2.25.0  (D)    mumps/5.3.5         scotch/6.1.0          superlu-dist/8.1.2 (D)\n   eigen/3.4.0         libxml2/2.9.7        mumps/5.5.1  (D)    scotch/7.0.3   (D)    superlu/5.2.2\n\n------------------------------------- /work/y07/shared/archer2-lmod/apps/core -------------------------------------\n   castep/22.11                    namd/2.14                 (D)    py-chemshell/21.0.3\n   code_saturne/7.0.1-cce15        nektar/5.2.0                     quantum_espresso/6.8  (D)\n   code_saturne/7.0.1-gcc11 (D)    nwchem/7.0.2                     quantum_espresso/7.1\n   cp2k/cp2k-2023.1                onetep/6.1.9.0-CCE-LibSci (D)    tcl-chemshell/3.7.1\n   elk/elk-7.2.42                  onetep/6.1.9.0-GCC-LibSci        vasp/5/5.4.4.pl2-vtst\n   fhiaims/210716.3                onetep/6.1.9.0-GCC-MKL           vasp/5/5.4.4.pl2\n   gromacs/2022.4+plumed           openfoam/com/v2106               vasp/6/6.3.2-vtst\n   gromacs/2022.4           (D)    openfoam/com/v2212        (D)    vasp/6/6.3.2          (D)\n   lammps/17Feb2023                openfoam/org/v9.20210903\n   namd/2.14-nosmp                 openfoam/org/v10.20230119 (D)\n\n------------------------------------ /work/y07/shared/archer2-lmod/utils/core -------------------------------------\n   amd-uprof/3.6.449          darshan-util/3.3.1        imagemagick/7.1.0         reframe/4.1.0\n   forge/24.0                 epcc-reframe/0.2          ncl/6.6.2                 tcl/8.6.13\n   bolt/0.7                   epcc-setup-env     (L)    nco/5.0.3          (D)    tk/8.6.13\n   bolt/0.8          (L,D)    gct/v6.2.20201212         nco/5.0.5                 usage-analysis/1.2\n   cdo/1.9.9rc1               genmaskcpu/1.0            ncview/2.1.7              visidata/2.1\n   cdo/2.1.1         (D)      gnuplot/5.4.2-simg        other-software/1.0        vmd/1.9.3-gcc10\n   cmake/3.18.4               gnuplot/5.4.2      (D)    paraview/5.9.1     (D)    xthi/1.3\n   cmake/3.21.3      (D)      gnuplot/5.4.3             paraview/5.10.1\n\n--------------------- /opt/cray/pe/lmod/modulefiles/mpi/crayclang/14.0/ofi/1.0/cray-mpich/8.0 ---------------------\n   cray-hdf5-parallel/1.12.2.1    cray-mpixlate/1.0.0.6    cray-parallel-netcdf/1.12.3.1\n\n--------------------------- /opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0 ---------------------------\n   cray-mpich-abi/8.1.23    cray-mpich/8.1.23 (L)\n\n...output trimmed...\n

This will list all the names and versions of the modules that you can currently load. Note that other modules may be defined but not available to you as they depend on modules you do not have loaded. Lmod only shows modules that you can currently load, not all those that are defined. You can search for modules that are not currently visble to you using the module spider command - we cover this in more detail below.

Note also, that not all modules may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change and old versions of software may be deleted.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the HPE Cray FFTW library, use:

auser@ln03:~>  module avail cray-fftw\n\n--------------------------------- /opt/cray/pe/lmod/modulefiles/cpu/x86-rome/1.0 ----------------------------------\n   cray-fftw/3.3.10.3\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n
"},{"location":"user-guide/sw-environment/#module-spider","title":"module spider","text":"

The module spider command is used to find out which modules are defined on the system. Unlike module avail, this includes modules that are not currently able to be loaded due to the fact you have not yet loaded dependencies to make them directly available.

module spider takes 3 forms:

If you cannot find a module that you expect to be on the system using module avail then you can use module spider to find out which dependencies you need to load to make the module available.

For example, the module cray-netcdf-hdf5parallel is installed on ARCHER2 but it will not be found by module avail:

auser@ln03:~> module avail cray-netcdf-hdf5parallel\nNo module(s) or extension(s) found!\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n

We can use module spider without any arguments to verify it exists and list the versions available:

auser@ln03:~> module spider\n\n-----------------------------------------------------------------------------------------------\nThe following is a list of the modules and extensions currently available:\n-----------------------------------------------------------------------------------------------\n\n...output trimmed...\n\n  cray-mpich-abi: cray-mpich-abi/8.1.23\n\n  cray-mpixlate: cray-mpixlate/1.0.0.6\n\n  cray-mrnet: cray-mrnet/5.0.4\n\n  cray-netcdf: cray-netcdf/4.9.0.1\n\n  cray-netcdf-hdf5parallel: cray-netcdf-hdf5parallel/4.9.0.1\n\n  cray-openshmemx: cray-openshmemx/11.5.7\n\n...output trimmed...\n

Now we know which versions are available, we can use module spider cray-netcdf-hdf5parallel/4.9.0.1 to find out how we can make it available:

auser@ln03:~> module spider module spider cray-netcdf-hdf5parallel/4.9.0.1\n\n---------------------------------------------------------------------------------------------------------------\n  cray-netcdf-hdf5parallel: cray-netcdf-hdf5parallel/4.9.0.1\n---------------------------------------------------------------------------------------------------------------\n\n    You will need to load all module(s) on any one of the lines below before the \"cray-netcdf-hdf5parallel/4.9.0.1\" module is available to load.\n\n      aocc/3.2.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      cce/15.0.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-none  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-ofi  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      craype-network-ucx  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      gcc/10.3.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n      gcc/11.2.0  cray-mpich/8.1.23  cray-hdf5-parallel/1.12.2.1\n\n    Help:\n      Release info:  /opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/release_info\n

There is a lot of information here, but what the output is essentailly telling us is that in order to have cray-netcdf-hdf5parallel/4.9.0.1 available to load we need to have loaded a compiler (any version of CCE, GCC or AOCC), an MPI library (any version of cray-mpich) and cray-hdf5-parallel loaded. As we always have a compiler and MPI library loaded, we can satisfy all of the dependencies by loading cray-hdf5-parallel, and then we can use module avail cray-netcdf-hdf5parallel again to show that the module is now available to load:

auser@ln03:~> module load cray-hdf5-parallel\nauser@ln03:~> module avail cray-netcdf-hdf5parallel\n\n--- /opt/cray/pe/lmod/modulefiles/hdf5-parallel/crayclang/14.0/ofi/1.0/cray-mpich/8.0/cray-hdf5-parallel/1.12.2 ---\n   cray-netcdf-hdf5parallel/4.9.0.1\n\nModule defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.\nSee https://lmod.readthedocs.io/en/latest/060_locating.html for details.\n\nUse \"module spider\" to find all possible modules and extensions.\nUse \"module keyword key1 key2 ...\" to search for all possible modules matching any of the \"keys\".\n
"},{"location":"user-guide/sw-environment/#module-help","title":"module help","text":"

If you want more info on any of the modules, you can use the module help command:

auser@ln03:~> module help gromacs\n
"},{"location":"user-guide/sw-environment/#module-show","title":"module show","text":"

The module show command reveals what operations the module actually performs to change your environment when it is loaded. For example, for the default FFTW module:

auser@ln03:~> module show gromacs\n\n  [...]\n
"},{"location":"user-guide/sw-environment/#loading-removing-and-swapping-modules","title":"Loading, removing and swapping modules","text":"

To change your environment and make different software available you use the following commands which we cover in more detail below.

"},{"location":"user-guide/sw-environment/#module-load","title":"module load","text":"

To load a module to use the module load command. For example, to load the default version of GROMACS into your environment, use:

auser@ln03:~> module load gromacs\n

Once you have done this, your environment will be setup to use GROMACS. The above command will load the default version of GROMACS. If you need a specific version of the software, you can add more information:

auser@uan01:~> module load gromacs/2022.4 \n

will load GROMACS version 2022.4 into your environment, regardless of the default.

"},{"location":"user-guide/sw-environment/#module-remove","title":"module remove","text":"

If you want to remove software from your environment, module remove will remove a loaded module:

auser@uan01:~> module remove gromacs\n

will unload what ever version of gromacs you might have loaded (even if it is not the default).

"},{"location":"user-guide/sw-environment/#module-swap","title":"module swap","text":"

There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using module swap oldmodule newmodule.

For example, to swap from the default CCE (cray) compiler environment to the GCC (gnu) compiler environment, you would use:

auser@ln03:~> module swap PrgEnv-cray PrgEnv-gnu\n

You did not need to specify the version of the loaded module in your current environment as this can be inferred as it will be the only one you have loaded.

"},{"location":"user-guide/sw-environment/#shell-environment-overview","title":"Shell environment overview","text":"

When you log in to ARCHER2, you are using the bash shell by default. As with any software, the bash shell has loaded a set of environment variables that can be listed by executing printenv or export.

The environment variables listed before are useful to define the behaviour of the software you run. For instance, OMP_NUM_THREADS define the number of threads.

To define an environment variable, you need to execute:

export OMP_NUM_THREADS=4\n

Please note there are no blanks between the variable name, the assignation symbol, and the value. If the value is a string, enclose the string in double quotation marks.

You can show the value of a specific environment variable if you print it:

echo $OMP_NUM_THREADS\n

Do not forget the dollar symbol. To remove an environment variable, just execute:

unset OMP_NUM_THREADS\n

Note that the dollar symbol is not included when you use the unset command.

"},{"location":"user-guide/sw-environment/#cgroup-control-of-login-resources","title":"cgroup control of login resources","text":"

Note that it not possible for a single user to monopolise the resources on a login node as this is controlled by cgroups. This means that a user cannot slow down the response time for other users.

"},{"location":"user-guide/tds/","title":"ARCHER2 Test and Development System (TDS) user notes","text":"

The ARCHER2 Test and Development System (TDS) is a small system used for testing changes before they are rolled out onto the full ARCHER2 system. This page contains useful information for people using the TDS on its configuration and what they can expect from the system.

Important

The TDS is used for testing on a day to day basis. This means that nodes and the entire system may be made unavailable or rebooted with little or no warning.

"},{"location":"user-guide/tds/#tds-system-details","title":"TDS system details","text":""},{"location":"user-guide/tds/#connecting-to-the-tds","title":"Connecting to the TDS","text":"

You can only log into the TDS from an ARCHER2 login node. You should create an SSH key pair on an ARCHER2 login node and add the public part to your ARCHER2 account in SAFE in the usual way.

Once your new key pair is setup, you can then login to the TDS (from an ARCHER2 login node) with

ssh login-tds.archer2.ac.uk\n

You will require your SSH key passphrase (for the new key pair you generated) and your usual ARCHER2 account password to login to the TDS.

"},{"location":"user-guide/tds/#slurm-scheduler-configuration","title":"Slurm scheduler configuration","text":""},{"location":"user-guide/tds/#known-issuesnotes","title":"Known issues/notes","text":""},{"location":"user-guide/tuning/","title":"Performance tuning","text":""},{"location":"user-guide/tuning/#mpi","title":"MPI","text":"

The vast majority of parallel scientific applications use the MPI library as the main way to implement parallelism; it is used so universally that the Cray compiler wrappers on ARCHER2 link to the Cray MPI library by default. Unlike other clusters you may have used, there is no choice of MPI library on ARCHER2: regardless of what compiler you are using, your program will use Cray MPI. This is because the Slingshot network on ARCHER2 is Cray-specific and significant effort has been put in by Cray software engineers to optimise the MPI performance on their Shasta systems.

Here we list a number of suggestions for improving the performance of your MPI programs on ARCHER2. Although MPI programs are capable of scaling very well due to the bespoke communications hardware and software, the details of how a program calls MPI can have significant effects on achieved performance.

Note

Many of these tips are actually quite generic and should be beneficial to any MPI program; however, they all become much more important when running on very large numbers of processes on a machine the size of ARCHER2.

"},{"location":"user-guide/tuning/#mpi-environment-variables","title":"MPI environment variables","text":"

There are a number of environment variables available to control aspects of MPI behavour on ARCHER2, the set of options can be displayed by running,

man intro_mpi\n
o n the ARCHER2 login nodes.

A couple of specific variables to highlight are MPICH_OFI_STARTUP_CONNECT and MPICH_OFI_RMA_STARTUP_CONNECT.

When using the default OFI transport layer the connections between ranks are set-up as they are required. This allows for good performance while reducing memory requirements. However for jobs using all-to-all communication it might be better to generate these connections in a coordinated way at the start of the application. To enable this set the following environment variable:

  export MPICH_OFI_STARTUP_CONNECT=1  \n

Additionally, RMA jobs requiring an all-to-all communication pattern on node it may be beneficial to set up the connections between processes on a node in a coordinated fashion:

  export MPICH_OFI_RMA_STARTUP_CONNECT=1\n

This option automatically enables MPICH_OFI_STARTUP_CONNECT.

"},{"location":"user-guide/tuning/#synchronous-vs-asynchronous-communications","title":"Synchronous vs asynchronous communications","text":""},{"location":"user-guide/tuning/#mpi_send","title":"MPI_Send","text":"

A standard way to send data in MPI is using MPI_Send (aptly called standard send). Somewhat confusingly, MPI is allowed to choose how to implement this in two different ways:

The rationale is that MPI, rather than the user, should decide how best to send a message.

In practice, what typically happens is that MPI tries to use an asynchronous approach via the eager protocol: the message is sent directly to a preallocated buffer on the receiver and the routine returns immediately afterwards. Clearly there is a limit on how much space can be reserved for this, so:

The threshold is often termed the eager limit which is fixed for the entire run of your program. It will have some default setting which varies from system to system, but might be around 8K bytes.

"},{"location":"user-guide/tuning/#implications","title":"Implications","text":""},{"location":"user-guide/tuning/#tuning-performance","title":"Tuning performance","text":"

With most MPI libraries you should be able to alter the default value of the eager limit at runtime, perhaps via an environment variable or a command-line argument to mpirun.

The advice for tuning the performance of MPI_Send is

Note

It cannot be stressed strongly enough that although the performance may be affected by the value of the eager limit, the functionality of your program should be unaffected. If changing the eager limit affects the correctness of your program (e.g. whether or not it deadlocks) then you have an incorrect MPI program.

"},{"location":"user-guide/tuning/#setting-the-eager-limit-on-archer2","title":"Setting the eager limit on ARCHER2","text":"

On ARCHER2, things are a little more complicated. Although the eager limit defaults to 16KiB, messages up to 256KiB are sent asynchronously because they are actually sent as a number of smaller messages.

To send even larger messages asynchronously, alter the value of FI_OFI_RXM_SAR_LIMIT in your job submission script, e.g. to set to 512KiB:

export FI_OFI_RXM_SAR_LIMIT=524288\n

You can also control the size of the smaller messages by altering the value of FI_OFI_RXM_BUFFER_SIZE in your job submission script, e.g. to set to 128KiB:

export FI_OFI_RXM_BUFFER_SIZE=131072\n

A different protocol is used for messages between two processes on the same node. The default eager limit for these is 8K. Although the performance of on-node messages is unlikely to be a limiting factor for your program you can change this value, e.g. to set to 16KiB:

export MPICH_SMP_SINGLE_COPY_SIZE=16384\n
"},{"location":"user-guide/tuning/#collective-operations","title":"Collective operations","text":"

Many of the collective operations that are commonly required by parallel scientific programs, i.e. operations that involve a group of processes, are already implemented in MPI. The canonical operation is perhaps adding up a double precision number across all MPI processes, which is best achieved by a reduction operation:

MPI_Allreduce(&x, &xsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);\n

This will be implemented using an efficient algorithm, for example based on a binary tree. Using such divide-and-conquer approaches typically results in an algorithm whose execution time on P processes scales as log_2(P); compare this to a naive approach where every process sends its input to rank 0 where the time will scale as P. This might not be significant on your laptop, but even on as few as 1000 processes the tree-based algorithm will already be around 100 times faster.

So, the basic advice is always use a collective routine to implement your communications pattern if at all possible.

In real MPI applications, collective operations are often called on a small amount of data, for example a global reduction of a single variable. In these cases, the time taken will be dominated by message latency and the first port of call when looking at performance optimisation is to call them as infrequently as possible!

Sometimes, the collective routines available may not appear to do exactly what you want. However, they can sometimes be used with a small amount of additional programming work:

Many MPI programs call MPI_Barrier to explicitly synchronise all the processes. Although this can be useful for getting reliable performance timings, it is rare in practice to find a program where the call is actually needed for correctness. For example, you may see:

// Ensure the input x is available on all processes\nMPI_Barrier(MPI_COMM_WORLD);\n// Perform a global reduction operation\nMPI_Allreduce(&x, &xsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);\n// Ensure the result xsum is available on all processes\nMPI_Barrier(MPI_COMM_WORLD);\n

Neither of these barriers are needed as the reduction operation performs all the required synchronisation.

If removing a barrier from your MPI code makes it run incorrectly, then this should ring alarm bells -- it is often a symptom of an underlying bug that is simply being masked by the barrier.

For example, if you use non-blocking calls such as MPI_Irecv then it is the programmer's responsibility to ensure that these are completed at some later point, for example by calling MPI_Wait on the returned request object. A common bug is to forget to do this, in which case you might be reading the contents of the receive buffer before the incoming message has arrived (e.g. if the sender is running late).

Calling a barrier may mask this bug as it will make all the processes wait for each other, perhaps allowing the late sender to catch up. However, this is not guaranteed so the real solution is to call the non-blocking communications correctly.

One of the few times when a barrier may be required is if processes are communicating with each other via some other non-MPI method, e.g. via the file system. If you want processes to sequentially open, append to, then close the same file then barriers are a simple way to achieve this:

for (i=0; i < size; i++)\n{\n  if (rank == i) append_data_to_file(data, filename);\n  MPI_Barrier(comm);\n}\n

but this is really something of a special case.

Global synchronisation may be required if you are using more advanced techniques such as hybrid MPI/OpenMP or single-sided MPI communication with put and get, but typically you should be using specialised routines such as MPI_Win_fence rather than MPI_Barrier.

Tip

If you run a performance profiler on your code and it shows a lot of time being spent in a collective operation such as MPI_Allreduce, this is not necessarily a sign that the reduction operation itself is the bottleneck. This is often a symptom of load imbalance: even if a reduction operation is efficiently implemented, it may take a long time to complete if the MPI processes do not all call it at the same time. MPI_Allreduce synchronises across processes so will have to wait for all the processes to call it before it can complete. A single slow process will therefore adversely impact the performance of your entire parallel program.

"},{"location":"user-guide/tuning/#openmp","title":"OpenMP","text":"

There are a variety of possible issues that can result in poor performance of OpenMP programs. These include:

"},{"location":"user-guide/tuning/#sequential-code","title":"Sequential code","text":"

Code outside of parallel regions is executed sequentially by the master thread.

"},{"location":"user-guide/tuning/#idle-threads","title":"Idle threads","text":"

If different threads have different amounts of computation to do, then threads may be idle whenever a barrier is encountered, for example at the end of parallel regions or the end of worksharing loops. For worksharing loops, choosing a suitable schedule kind may help. For more irregular computation patterns, using OpenMP tasks might offer a solution: the runtime will try to load balance tasks across the threads in the team.

Synchronisation mechanisms that enforce mutual exclusion, such as critical regions, atomic statements and locks can also result in idle threads if there is contention - threads have to wait their turn for access.

"},{"location":"user-guide/tuning/#synchronisation","title":"Synchronisation","text":"

The act of synchronising threads comes at some cost, even if the threads are never idle. In OpenMP, the most common source of synchronisation overheads is the implicit barriers at the end of parallel regions and worksharing loops. The overhead of these barriers depends on the OpenMP implementation being used as well as on the number of threads, but is typically in the range of a few microseconds. This means that for a simple parallel loop such as

#pragma omp parallel for reduction(+:sum)\nfor (i=0;i<n;i++){\n   sum += a[i];\n}\n

the number of iterations required to make parallel execution worthwhile may be of the order of 100,000. On ARCHER2, benchmarking has shown that for the AOCC compiler, OpenMP barriers have significantly higher overhead than for either the Cray or GNU compilers.

It is possible to suppress the implicit barrier at the end of worksharing loop using a nowait clause, taking care that this does not introduce and race conditions.

Atomic statements are designed to be capable of more efficient implementation that the equivalent critical region or lock/unlock pair, so should be used where applicable.

"},{"location":"user-guide/tuning/#scheduling","title":"Scheduling","text":"

Whenever we rely on the OpenMP runtime to dynamically assign computation to threads (e.g. dynamic or guided loop schedules, tasks), there is some overhead incurred (some of this cost may actually be internal synchronisation in the runtime). It is often necessary to adjust the granularity of the computation to find a compromise between too many small units (and high scheduling cost) and too few large units (where load imbalance may dominate). For example, we can choose a non-default chunksize for the dynamic schedule, or adjust the amount of computation within each OpenMP task construct.

"},{"location":"user-guide/tuning/#communication","title":"Communication","text":"

Communication between threads in OpenMP takes place via the cache coherency mechanism. In brief, whenever a thread writes a memory location, all copies of this location which are in a cache belonging to a different core have to be marked as invalid. Subsequent accesses to this location by other threads will result in the up-to-date value being retrieved from the cache where the last write occurred (or possibly from main memory).

Due to the fine granularity of memory accesses, these overheads are difficult to analyse or monitor. To minimise communication, we need to write code with good data affinity - i.e. each thread should access the same subset of program data as much as possible.

"},{"location":"user-guide/tuning/#numa-effects","title":"NUMA effects","text":"

On modern CPU nodes, main memory is often organised in NUMA regions - sections of main memory associated with a subset of the cores on a node. On ARCHER2 nodes, there are 8 NUMA regions per node, each associated with 16 CPU cores. On such systems the location of data in main memory with respect to the cores that are accessing it can be important. The default OS policy is to place data in the NUMA region which first accesses it (first touch policy). For OpenMP programs this can be the worst possible option: if the data is initialised by the master thread, it is all allocated one NUMA region and having all threads accessing data becomes a bandwidth bottleneck.

This default policy can be changed using the numactl command, but it is probably better to make use of the first touch policy by explicitly parallelising the data initialisation in the application code. This may be straightforward for large multidimensional arrays, but more challenging for irregular data structures.

"},{"location":"user-guide/tuning/#false-sharing","title":"False sharing","text":"

The cache coherency mechanism described above operates on units of data corresponding to the size of cache lines - for ARCHER2 CPUs this is 64 bytes. This means that if different threads are accessing neighbouring words in memory, and at least some of the accesses are writes, then communication may be happening even if no individual word is actually being accessed by more than one thread. This means that patterns such as

#pragma omp parallel shared(count) private(myid) \n{\n  myid = omp_get_thread_num();\n  ....\n  count[myid]++;\n  ....\n}\n

may give poor performance if the updates to the count array are sufficiently frequent.

"},{"location":"user-guide/tuning/#hardware-resource-contention","title":"Hardware resource contention","text":"

Whenever there are multiple threads (or processes) executing inside a node, they may contend for some hardware resources. The most important of these for many HPC applications is memory bandwidth. This is effect is very evident on ARCHER2 CPUs - it is possible for just 2 threads to almost saturate the available memory bandwidth in a NUMA region which has 16 cores associated with it. For very bandwidth-intensive applications, running more that 2 threads per NUMA region may gain little additional performance. If an OpenMP code is not using all the cores on a node, by default Slurm will spread the threads out across NUMA regions to maximise the available bandwidth.

Another resource that threads may contend for is space in shared caches. On ARCHER2, every set of 4 cores shares 16MB of L3 cache.

"},{"location":"user-guide/tuning/#compiler-non-optimisation","title":"Compiler non-optimisation","text":"

In rare cases, adding OpenMP directives can adversely affect the compiler's optimisation process. The symptom of this is that the OpenMP code running on 1 thread is slower than the same code compiled without the OpenMP flag. It can be difficult to find a workaround - using the compiler's diagnostic flags to find out which optimisation (e.g. vectorisation, loop unrolling) is being affected and adding compiler-specific directives may help.

"},{"location":"user-guide/tuning/#hybrid-mpi-and-openmp","title":"Hybrid MPI and OpenMP","text":"

There are two main motivations for using both MPI and OpenMP in the same application code: reducing memory requirements and improving performance. At low core counts, where the pure MPI version of the code is still scaling well, adding OpenMP is unlikely to improve performance. In fact, it can introduce some additional overheads which make performance worse! The benefit is likely to come in the regime where the pure MPI version starts to lose scalability - here adding OpenMP can reduce communication costs, make load balancing easier, or be an effective way of exploiting additional parallelism without excessive code re-writing.

An important performance consideration for MPI + OpenMP applications is the choice of the number of OpenMP threads per MPI process. The optimum value will depend on the application, the input data, the number of nodes requested and the choice of compiler, and is hard to predict without experimentation. However, there are some considerations that apply to ARCHER2:

"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 021ffc67ea0d00372aae0c04b9f6f4fd728ab3b3..2c84ee3e9a241291be01f7306eedacddb3a3c9f8 100644 GIT binary patch delta 13 Ucmb=gXP58h;9w|YnaExN02k>4N&o-= delta 13 Ucmb=gXP58h;Al``p2%JS02xaIcK`qY diff --git a/user-guide/gpu/index.html b/user-guide/gpu/index.html index 2add403b0..1bc8f9fd0 100644 --- a/user-guide/gpu/index.html +++ b/user-guide/gpu/index.html @@ -4082,9 +4082,6 @@

Multiple #SBATCH --partition=gpu #SBATCH --qos=gpu-exc -# Enable GPU-aware MPI -export MPICH_GPU_SUPPORT_ENABLED=1 - # Check assigned GPU srun --ntasks=1 rocm-smi @@ -4094,6 +4091,9 @@

Multiple --hint=nomultithread --distribution=block:block \ xthi +# Enable GPU-aware MPI +export MPICH_GPU_SUPPORT_ENABLED=1 + srun --ntasks=4 --cpus-per-task=8 \ --hint=nomultithread --distribution=block:block \ ./my_gpu_program.x @@ -4123,31 +4123,31 @@

Multiple #SBATCH --partition=gpu #SBATCH --qos=gpu-exc -# Enable GPU-aware MPI -export MPICH_GPU_SUPPORT_ENABLED=1 - # Check assigned GPU nodelist=$(scontrol show hostname $SLURM_JOB_NODELIST) for nodeid in $nodelist do echo $nodeid - srun --ntasks=1 --nodelist=$nodeid rocm-smi + srun --ntasks=1 --gpus=4 --nodes=1 --ntasks-per-node=1 --nodelist=$nodeid rocm-smi done # Check process/thread pinning module load xthi -srun --ntasks=8 --cpus-per-task=8 \ +srun --ntasks-per-node=4 --cpus-per-task=8 \ --hint=nomultithread --distribution=block:block \ xthi -srun --ntasks=8 --cpus-per-task=8 \ +# Enable GPU-aware MPI +export MPICH_GPU_SUPPORT_ENABLED=1 + +srun --ntasks-per-node=4 --cpus-per-task=8 \ --hint=nomultithread --distribution=block:block \ ./my_gpu_program.x

Note

When you use the --qos=gpu-exc QoS you must also add the --exclusive flag -and then specify the number of nodes you want with --nodes=1.

+and then specify the number of nodes you want with, for example, --nodes=2.

Interactive jobs

Using salloc