From 1cf64d55c5d19e2a344a33499cfb437ff6a9be55 Mon Sep 17 00:00:00 2001 From: Olivia Hull Date: Mon, 29 Jan 2024 11:23:58 -0700 Subject: [PATCH 1/2] add openmpi performance notes --- .../Systems/Kestrel/Environments/index.md | 5 +++++ docs/Documentation/Systems/Kestrel/running.md | 14 ++++++++++---- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/docs/Documentation/Systems/Kestrel/Environments/index.md b/docs/Documentation/Systems/Kestrel/Environments/index.md index 6ddaf62a0..cddd2039e 100644 --- a/docs/Documentation/Systems/Kestrel/Environments/index.md +++ b/docs/Documentation/Systems/Kestrel/Environments/index.md @@ -20,6 +20,11 @@ The NREL-built environments function similarly to those on Eagle, and it is up t NREL-built environments can make use of Cray MPICH via the `cray-mpich-abi`. As long as program is compiled with an MPICH-based MPI (e.g., Intel MPI but *not* Open MPI), the `cray-mpich-abi` can be loaded at runtime, which causes the program to use Cray MPICH for dynamically built binaries. + +## A note on OpenMPI + +Currently, OpenMPI does not run performantly or stably on Kestrel. You should do your best to avoid using OpenMPI. Please reach out to hpc-help@nrel.gov if you need help working around OpenMPI. + ## Summary of available compiler environments Note: to access compilers not included in the default Cray modules (i.e., compilers within the NREL-built environment), you must run the command `source /nopt/nrel/apps/env.sh`. diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index 3e04ff7a6..0f0b4719e 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -97,11 +97,17 @@ You may need to export these variables even if you are not running your job with #### Scaling -Currently, some applications on Kestrel are not scaling with the expected performance. For these applications, we recommend: +Currently, some applications on Kestrel are not scaling with the expected performance. We are actively working with the vendor's engineers to resolve these issues. For now, for these applications, we recommend: -1. Submitting jobs with the fewest number of nodes possible. +1. Setting the following envrionment variables: +``` +export MPICH_SHARED_MEM_COLL_OPT=mpi_bcast,mpi_barrier +export MPICH_COLL_OPT_OFF=mpi_allreduce +``` + +2. Submitting jobs with the fewest number of nodes possible. -1. For hybrid MPI/OpenMP codes, requesting more threads per task than you tend to request on Eagle. This may yield performance improvements. -1. Building and running with Intel MPI or Cray MPICH, rather than OpenMPI. +3. For hybrid MPI/OpenMP codes, requesting more threads per task than you tend to request on Eagle. This may yield performance improvements. +4. Building and running with Cray MPICH (or Intel MPI/cray-mpich-abi), rather than OpenMPI. From f40ef0a43d6754e369952f8c37548fd7a13d6be2 Mon Sep 17 00:00:00 2001 From: Haley Yandt <46908710+yandthj@users.noreply.github.com> Date: Mon, 29 Jan 2024 11:34:59 -0700 Subject: [PATCH 2/2] Update docs/Documentation/Systems/Kestrel/running.md --- docs/Documentation/Systems/Kestrel/running.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index 0f0b4719e..d9dbe6eb6 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -99,7 +99,7 @@ You may need to export these variables even if you are not running your job with Currently, some applications on Kestrel are not scaling with the expected performance. We are actively working with the vendor's engineers to resolve these issues. For now, for these applications, we recommend: -1. Setting the following envrionment variables: +1. Setting the following environment variables: ``` export MPICH_SHARED_MEM_COLL_OPT=mpi_bcast,mpi_barrier export MPICH_COLL_OPT_OFF=mpi_allreduce