Merge branch 'gh-pages' into vs_code-update

NREL · Jan 21, 2025 · 4af1f02 · 4af1f02
2 parents 43762df + 4a1c79b
commit 4af1f02
Show file tree

Hide file tree

Showing 9 changed files with 58 additions and 62 deletions.
diff --git a/docs/Documentation/Applications/starccm.md b/docs/Documentation/Applications/starccm.md
@@ -37,8 +37,8 @@ Then you need to create a Slurm script `<your_scriptfile>` as shown below to sub
     #!/bin/bash -l
     #SBATCH --time=2:00:00             # walltime limit of 2 hours
     #SBATCH --nodes=2                  # number of nodes
-    #SBATCH --ntasks-per-node=104       # number of tasks per node (<=104 on Kestrel)
-    #SBATCH --ntasks=72                # total number of tasks
+    #SBATCH --ntasks-per-node=96       # number of tasks per node (<=104 on Kestrel)
+    #SBATCH --ntasks=192                # total number of tasks
     #SBATCH --job-name=your_simulation # name of job
     #SBATCH --account=<allocation-id>  # name of project allocation
 
@@ -68,8 +68,8 @@ STAR-CCM+ comes with its own Intel MPI. To use the Intel MPI, the Slurm script s
     #!/bin/bash -l
     #SBATCH --time=2:00:00             # walltime limit of 2 hours
     #SBATCH --nodes=2                  # number of nodes
-    #SBATCH --ntasks-per-node=104       # number of tasks per node (<=104 on Kestrel)
-    #SBATCH --ntasks=72                # total number of tasks
+    #SBATCH --ntasks-per-node=96       # number of tasks per node (<=104 on Kestrel)
+    #SBATCH --ntasks=192                # total number of tasks
     #SBATCH --job-name=your_simulation # name of job
     #SBATCH --account=<allocation-id>  # name of project allocation
 
@@ -102,8 +102,8 @@ STAR-CCM+ can run with Cray MPI. The following Slurm script submits STAR-CCM+ jo
     #!/bin/bash -l
     #SBATCH --time=2:00:00             # walltime limit of 2 hours
     #SBATCH --nodes=2                  # number of nodes
-    #SBATCH --ntasks-per-node=104       # number of tasks per node (<=104 on Kestrel)
-    #SBATCH --ntasks=72                # total number of tasks
+    #SBATCH --ntasks-per-node=96       # number of tasks per node (<=104 on Kestrel)
+    #SBATCH --ntasks=192                # total number of tasks
     #SBATCH --job-name=your_simulation # name of job
     #SBATCH --account=<allocation-id>  # name of project allocation
 

diff --git a/docs/Documentation/Applications/vasp.md b/docs/Documentation/Applications/vasp.md
@@ -49,23 +49,18 @@ NREL offers modules for VASP 5 and VASP 6 on CPUs as well as GPUs on certain sys
 
 #### CPU
 
-There are several modules for CPU builds of VASP 5 and VASP 6. As of 08/09/2024 we have released new modules for VASP on Kestrel CPUs: 
+There are several modules for CPU builds of VASP 5 and VASP 6.
 
 ```
 CPU $ module avail vasp
-
------------- /nopt/nrel/apps/cpu_stack/modules/default/application -------------
-   #new modules:
-   vasp/5.4.4+tpc     vasp/6.3.2_openMP+tpc    vasp/6.4.2_openMP+tpc
-   vasp/5.4.4_base    vasp/6.3.2_openMP        vasp/6.4.2_openMP
-   
-   # Legacy modules will be removed during system time in December!
-   vasp/5.4.4         vasp/6.3.2               vasp/6.4.2            (D)
+------------- /nopt/nrel/apps/cpu_stack/modules/default/application -------------
+   vasp/5.4.4+tpc    vasp/6.3.2_openMP+tpc    vasp/6.4.2_openMP+tpc
+   vasp/5.4.4        vasp/6.3.2_openMP        vasp/6.4.2_openMP     (D)
 ```
 
- What’s new: 
+ Notes:
 
- * New modules have been rebuilt with the latest Cray Programming Environment (cpe23), updated compilers, and math libraries.
+ * These modules have been built with the latest Cray Programming Environment (cpe23), updated compilers, and math libraries.
  * OpenMP capability has been added to VASP 6 builds.
  * Modules that include third-party codes (e.g., libXC, libBEEF, VTST tools, and VASPsol) are now denoted with +tpc. Use `module show vasp/<version>` to see details of a specific version.
 

diff --git a/docs/Documentation/Development/VSCode/vscode.md b/docs/Documentation/Development/VSCode/vscode.md
@@ -19,6 +19,14 @@ Enter your HPC password (or password and OTP code if external) and you will be c
 !!! bug "VS Code Remote-SSH Bug"
     If you are no longer able to connect to Kestrel with VS Code, in your settings for the Remote-SSH extension set "Use Exec Server" to False by unchecking the box. This issue is due to a VS Code bug in an update to the Remote-SSH plugin or VS code itself. 
 
+!!! bug "Windows SSH "Corrupted MAC on input" Error"
+    Some people who use Windows 10/11 computers to ssh to Kestrel via Visual Studio Code's SSH extension might receive an error message about a "Corrupted MAC on input" or "message authentication code incorrect." To workaround this issue, you will need to create an ssh config file on your local computer, `~/.ssh/config`, with a host entry for Kestrel that specifies a new message authentication code:
+    ```
+    Host kestrel
+        HostName kestrel.hpc.nrel.gov
+        MACs hmac-sha2-512
+    ```
+    This [Visual Studio Blog post](https://code.visualstudio.com/blogs/2019/10/03/remote-ssh-tips-and-tricks) has further instructions on how to create the ssh configuration file for Windows and VS Code.
 
 ## Caution About VS Code Processes
 

diff --git a/docs/Documentation/Environment/Customization/conda.md b/docs/Documentation/Environment/Customization/conda.md
@@ -216,14 +216,14 @@ python my_main.py
 
 ### Where to store Conda environments
 
-By default, the conda module uses the home directory for package caches and named environments. This can cause problems on the HPC systems because conda environments can require a lot of storage space, and home directories have a quota of 50GB. Additionally, the home filesystem is not designed to handle heavy I/O loads, so if you're running a lot of jobs or large multi-node jobs calling conda environments that are stored in home, it can strain the filesystem. 
+By default, the conda module uses the home directory for named environments. This can cause problems on the HPC systems because conda environments can require a lot of storage space, and home directories have a quota of 50GB. Additionally, the home filesystem is not designed to handle heavy I/O loads, so if you're running a lot of jobs or large multi-node jobs calling conda environments that are stored in home, it can strain the filesystem. 
+
+The conda module uses `/scratch/$USER/.conda-pkgs` for package caches by default to avoid filling up the home directory with cached conda data. If you would like to change this location, you can call `export CONDA_PKGS_DIRS=PATH_NAME` to specify somewhere to store downloads and cached files such as `/projects/<allocation handle>/$USER/.conda-pkgs`. We don't recommend using your home directory since this data can use a lot of space. 
 
 Some ways to change the default storage location for conda environments and packages:
 
 * Use the `-p PATH_NAME` switch when creating or updating your environment.  Make sure `PATH_NAME` isn't in the home directory. Keep in mind files in /scratch are deleted after about a month of inactivity.
 
-* Change the directory used for caching.  This location is set by the module file to `~/.conda-pkgs`.  A simple way to avoid filling up the home directory with cached conda data is to soft link a location on scratch to `~/.conda-pkgs`, for example `ln -s /scratch/$USER/.conda-pkgs /home/$USER/.conda-pkgs`.  Alternatively, you can call `export CONDA_PKGS_DIRS=PATH_NAME` to specify somewhere to store downloads and cached files such as `/projects/<allocation handle>/$USER/.conda-pkgs`.
-
 * Similarly, you can specify the directory in which environments are stored by default. To do this, either set the `CONDA_ENVS_PATH` environment variable, or use the `--prefix` option as [described above](./conda.md#creating-environments-by-location). 
 
 !!! warning

diff --git a/docs/Documentation/Systems/Kestrel/Running/index.md b/docs/Documentation/Systems/Kestrel/Running/index.md
@@ -11,7 +11,7 @@ There are two general types of compute nodes on Kestrel: CPU nodes and GPU nodes
 
 
 ### CPU Nodes
-Standard CPU-based compute nodes on Kestrel have 104 cores and 240G of usable RAM. 256 of those nodes have a 1.7TB NVMe local disk. There are also 10 bigmem nodes with 2TB of RAM and 5.6TB NVMe local disk.
+Standard CPU-based compute nodes on Kestrel have 104 cores and 240G of usable RAM. 256 of those nodes have a 1.7TB NVMe local disk. There are also 10 bigmem nodes with 2TB of RAM and 5.6TB NVMe local disk. Two racks of the CPU compute nodes have dual network interface cards (NICs) which may increase performance for certain types of multi-node jobs. 
 
 
 ### GPU Nodes
@@ -45,6 +45,7 @@ The following table summarizes the partitions on Kestrel:
 | ```long```     | Nodes that prefer jobs with walltimes > 2 days.<br>*Maximum walltime of any job is 10 days*| 525 nodes total.<br> 262 nodes per user.|  ```--time <= 10-00```<br>```--mem <= 246064```<br>```--tmp <= 1700000  (256 nodes)```|
 |```bigmem```    | Nodes that have 2 TB of RAM and 5.6 TB NVMe local disk. | 8 nodes total.<br> 4 nodes per user. | ```--mem > 246064```<br> ```--time <= 2-00```<br>```--tmp > 1700000 ``` |
 |```bigmeml```    | Bigmem nodes that prefer jobs with walltimes > 2 days.<br>*Maximum walltime of any job is 10 days.*  | 4 nodes total.<br> 3 nodes per user. | ```--mem > 246064```<br>```--time > 2-00```<br>```--tmp > 1700000 ``` | 
+|```hbw```    | CPU compute nodes with dual network interface cards. | 512 nodes total.<br> 256 nodes per user. <br> Minimum 2 nodes per job. | ```-p hbw``` <br>```--time <= 10-00``` <br> ```--nodes >= 2```| 
 | ```shared```|  Nodes that can be shared by multiple users and jobs. | 64 nodes total. <br> Half of partition per user. <br> 2 days max walltime.  | ```-p shared``` <br>   or<br>  ```--partition=shared```| 
 | ```sharedl```|  Nodes that can be shared by multiple users and prefer jobs with walltimes > 2 days. | 16 nodes total. <br> 8 nodes per user. | ```-p sharedl``` <br>   or<br>  <nobr>```--partition=sharedl```</nobr>| 
 | ```gpu-h100```|  Shareable GPU nodes with 4 NVIDIA H100 SXM 80GB Computational Accelerators. | 130 nodes total. <br> 65 nodes per user. | ```1 <= --gpus <= 4``` <br>  ```--time <= 2-00```| 
@@ -81,6 +82,20 @@ Currently, there are 64 standard compute nodes available in the shared partition
     srun ./my_progam # Use your application's commands here  
     ```
 
+### High Bandwidth Partition
+
+In December 2024, Kestrel had two racks of CPU nodes reconfigured with an extra network interface card, which can greatly benefit communication-bound HPC software.
+A NIC is a hardware component that enables inter-node (i.e., *network*) communication as multi-node jobs run. 
+On Kestrel, most CPU nodes include a single NIC. Although having one NIC per node is acceptable for the majority of workflows run on Kestrel, it can lead to communication congestion 
+when running multi-node applications that send significant amounts of data over Kestrel's network. When this issue is encountered, increasing the number of available NICs 
+can alleviate such congestion during runtime. Some common examples of communication-bound HPC software are AMRWind and LAMMPS.
+
+To request nodes with two NICs, specify `--partition=hbw` in your job submissions. Because the purpose of the high bandwidth nodes is to optimize communication in multi-node jobs, it is not permitted to submit single-node jobs to the `hbw` partition.
+If you would like assistance with determining whether your workflow could benefit from running in the `hbw` partition, please reach out to [HPC-Help@nrel.gov](mailto:HPC-Help).
+
+!!! info
+    We'll be continuing to update documentation with use cases and recommendations for the dual NIC nodes, including specific examples on the LAMMPS and AMRWind pages. 
+
 
 ### GPU Jobs
 

diff --git a/docs/Documentation/Systems/Swift/running.md b/docs/Documentation/Systems/Swift/running.md
@@ -31,7 +31,6 @@ The most up to date list of partitions can always be found by running the `sinfo
 | long      | jobs up to ten days of walltime |
 | standard  | jobs up to two days of walltime | 
 | gpu  |  Nodes with four NVIDIA A100 40 GB Computational Accelerators, up to two days of walltime |
-| parallel  | optimized for large parallel jobs, up to two days of walltime |
 | debug     | two nodes reserved for short tests, up to four hours of walltime |
 
 Each partition also has a matching `-standby` partition. Allocations which have consumed all awarded AUs for the year may only submit jobs to these partitions, and their default QoS will be set to `standby`. Jobs in standby partitions will be scheduled when there are otherwise idle cycles and no other non-standby jobs are available. Jobs that run in the standby queue will not be charged any AUs. 

diff --git a/docs/Documentation/Viz_Analytics/index.md b/docs/Documentation/Viz_Analytics/index.md
@@ -2,10 +2,6 @@
 
 *Learn about the available visualization and analytics software tools*
 
-???+ example "Note:"
-    The instructions shown on this page are given in the context of Eagle supercomputer.
-
-
 ## VirtualGL/FastX
 
 Provides remote visualization for OpenGL-based applications. For more information, see [using VirtualGL and FastX ](virtualgl_fastx.md).
@@ -15,12 +11,6 @@ Provides remote visualization for OpenGL-based applications. For more informatio
 An open-source, multi-platform data analysis and visualization application. 
 For information, see [using ParaView](paraview.md).
 
-## VAPOR
-
-VAPOR (Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Researchers) enables interactive exploration of terascale gridded data sets that are large in both the spatial and temporal domains. Wavelet-based multiresolution data representation permits users to make speed/quality trade-offs for visual as well as non-visual data exploration tasks.
-
-For more information see the [VAPOR website](https://www.vapor.ucar.edu/).
-
 ## R Statistical Computing Environment
 
 R is a language and environment for statistical computing and graphics. For more information, see [running R](../Development/Languages/r.md).
@@ -33,9 +23,6 @@ The name MATLAB stands for Matrix Laboratory. MATLAB was originally written to p
 
 For more information, see [using MATLAB software](../Applications/Matlab/index.md). 
 
-Interactive Data Language
-IDL, the Interactive Data Language, is an interactive application used for data analysis, visualization and cross-platform application development.
-
 ## VisIt
 
 VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms.  VisIt features a robust remote visualization capability. VisIt can be started on a local machine and used to visualize data on a remote compute cluster.