Merge pull request #710 from sinolonghai/patch-9

Update lammps.md
NREL · Dec 16, 2024 · bc63dc4 · bc63dc4
2 parents c7384e8 + 83229d2
commit bc63dc4
Showing 1 changed file with 72 additions and 21 deletions.
diff --git a/docs/Documentation/Applications/lammps.md b/docs/Documentation/Applications/lammps.md
@@ -4,33 +4,84 @@
 
 LAMMPS has numerous built-in potentials for simulations of solid-state, soft matter, and coarse-grained systems. It can be run on a single processor or in parallel using MPI. To learn more, see the [LAMMPS website](https://www.lammps.org/#gsc.tab=0). 
 
-The most recent version of LAMMPS on Eagle and Swift at the time of this page being published is the 23Jun22 version. The following packages have been installed in this version: asphere, body, bocs, class2, colloid, dielectric, diffraction, dipole, dpd-basic, drude, eff, electrode, extra-fix, extra-pair, fep, granular, h5md, intel, interlayer, kspace, manifold, manybody, mc, meam, misc, molecule, mpiio, openmp, opt, python, phonon, qep, qmmm, reaction, reaxff, replica, rigid, shock, spin, voronoi.
+The versions of LAMMPS on Kestrel, Swift, and Vermilion can be checked by running `module avail lammps`. Usually there are two recent stable versions available that were compiled using different compiler and MPI toolchains. The following packages have been installed: asphere, body, bocs, class2, colloid, dielectric, diffraction, dipole, dpd-basic, drude, eff, electrode, extra-fix, extra-pair, fep, granular, h5md, intel, interlayer, kspace, manifold, manybody, mc, meam, misc, molecule, mpiio, openmp, opt, python, phonon, qep, qmmm, reaction, reaxff, replica, rigid, shock, spin, voronoi.
 
-## Sample Slurm Script 
-A sample Slurm script for LAMMPS is given below:
+If you need other packages or a certain LAMMPS version, please [contact us](mailto:HPC-Help@nrel.gov). 
 
-??? example "Sample Slurm script"
+## Sample CPU Slurm Script 
+A sample Slurm script for running LAMMPS on Kestrel CPU nodes is given below:
 
-    ``` bash
-    #!/bin/bash
-    #SBATCH --time=48:00:00 
-    #SBATCH --nodes=4
-    #SBATCH --job-name=lammps_test
-    #SBATCH --output=std.out
-    #SBATCH --error=std.err
+```
+#!/bin/bash
+#SBATCH --job-name cpu-test
+#SBATCH --nodes=2 #Request two CPU nodes
+#SBATCH --time=1:00:00
+#SBATCH --account=[your allocation name]
+#SBATCH --error=std.err
+#SBATCH --output=std.out
+#SBATCH --tasks-per-node=104
+#SBATCH --exclusive
+#SBATCH -p debug
 
-    module purge
-    module load lammps/20220623 
-    cd $SLURM_SUBMIT_DIR
+module load lammps/080223-intel-mpich
+module list
 
-    srun -n 144 lmp -in lmp.in -l lmp.out
-    ```
+run_cmd="srun --mpi=pmi2 "
+lmp_path=lmp
+name=my_job
+$run_cmd $lmp_path -in $name.in >& $name.log
+```
 
-where `lmp.inp` is the input and `lmp.out` is the output. This runs LAMMPS using four nodes with 144 cores. 
+where `my_job.in` is the input and `my_job.log` is the output. This runs LAMMPS using two nodes with 208 MPI ranks. 
+
+## Sample GPU Slurm Script 
+A sample Slurm script for running LAMMPS on Kestrel GPU nodes is given below:
+
+```
+#!/bin/bash
+#SBATCH --job-name gpu-test
+#SBATCH --nodes=1 #Request one GPU node
+#SBATCH --time=1:00:00
+#SBATCH --account=[your_allocation_name]
+#SBATCH --error=std.err
+#SBATCH --output=std.out
+#SBATCH --tasks-per-node=8 #Running 8 MPI tasks per node
+#SBATCH --mem=16G #Request memory
+#SBATCH --gres=gpu:2 #Request 2 GPU per node
+#SBATCH -p debug
+
+module load lammps/080223-gpu
+module list
+
+export MPICH_GPU_SUPPORT_ENABLED=1
+#Request 2 GPU per node
+export CUDA_VISIBLE_DEVICES=0,1 
+
+run_cmd="srun --mpi=pmi2 "
+lmp_path=lmp
+name=medium
+#Request 2 GPU per node
+gpu_opt="-sf gpu -pk gpu 2"
+$run_cmd $lmp_path $gpu_opt -in $name.in >& $name.gpu.log
+```
+
+This runs LAMMPS using one nodes with 8 MPI ranks and 2 GPUs. The following information will be printed out in my_job.log file:
+```
+--------------------------------------------------------------------------
+- Using acceleration for pppm:
+-  with 4 proc(s) per device.
+-  Horizontal vector operations: ENABLED
+-  Shared memory system: No
+--------------------------------------------------------------------------
+Device 0: NVIDIA H100 80GB HBM3, 132 CUs, 77/79 GB, 2 GHZ (Mixed Precision)
+Device 1: NVIDIA H100 80GB HBM3, 132 CUs, 2 GHZ (Mixed Precision)
+--------------------------------------------------------------------------
+```
+
+## Hints and Additional Resources
+1. For calculations requesting more than ~10 nodes, the cray mpich stall library is recommended, the details are described at [MPI Stall Library](https://nrel.github.io/HPC/Documentation/Systems/Kestrel/Running/performancerecs/#mpi-stall-library) and [Improvement of LAMMPS Performance by Using CQ STALL Feature](https://github.nrel.gov/hlong/lammps_stall)
+2. For CPU runs, especially for multi-nodes runs, the optimal performance for a particular job may be at a tasks-per-node value less than 104. For GPU runs, number of GPUs should also be varied to achieve the optimal performance. Users should investigate those parameters for large jobs by performing some short test runs.
+3. For instructions on running LAMMPS with OpenMP, see the [HPC Github code repository](https://github.com/NREL/HPC/tree/master/applications/lammps).
 
-## Additional Resources
-For instructions on running LAMMPS with OpenMP, see the [HPC Github code repository](https://github.com/NREL/HPC/tree/master/applications/lammps).
 
-## Contact 
-If you need other packages, please [contact us](mailto:HPC-Help@nrel.gov).