Merge pull request #564 from aturner-epcc/aturner-epcc/darshan-docs

Adds Darshan documentation
ARCHER2-HPC · Dec 11, 2023 · 499000a · 499000a
2 parents e170705 + 2294afb
commit 499000a
Show file tree

Hide file tree

Showing 4 changed files with 111 additions and 1 deletion.
diff --git a/docs/data-tools/darshan.md b/docs/data-tools/darshan.md
@@ -0,0 +1,98 @@
+# Darshan
+
+Darshan is a scalable HPC I/O characterization tool. Darshan is designed to capture
+an accurate picture of application I/O behavior, including properties such as patterns
+of access within files, with minimum overhead.  The name is taken from a Sanskrit
+word for "sight" or "vision". 
+
+Darshan is developed at the [Argonne Leadership Computing Facility (ALCF)](https://www.alcf.anl.gov/)
+
+Useful links:
+
+- [Darshan home page](https://www.mcs.anl.gov/research/projects/darshan/)
+- [Darshan documentation](https://www.mcs.anl.gov/research/projects/darshan/documentation/)
+
+## Using Darshan on ARCHER2
+
+Using Darshan generally consists of two stages:
+
+1. Collect IO profile data using the Darshan runtime
+2. Analysing Darshan log files using Darshan utility software
+
+### Collecting IO profile data
+
+To collect IO profile data you add the command:
+
+
+```
+module load darshan
+```
+
+to your job submission script as the **last** `module` command before you run your program. As Darshan
+does not distinguish between different software run in your job submission script, we typically 
+recommand that you use a structure like:
+
+```
+module load darshan
+srun ...usual software launch options...
+module remove darshan
+```
+
+This will avoid Darshan profiling IO for operations that are not part of your main parallel program.
+
+!!! important
+    The `darshan` module is dependent on the compiler environment you are using and you should ensure
+    that you load the `darshan` module that matches the compiler environment you used to compile the
+    program you are analysing. For example, if your software was compiled using `PrgEnv-gnu`, then you
+    would need to activate the GCC compiler environment before loading the `darshan` module to ensure you
+    get the GCC version of Darshan. This means loading the correct `PrgEnv-` module before you load the
+    `darshan` module:
+
+    ```
+    module load PrgEnv-gnu
+    module load darshan
+    srun ...usual software launch options...
+    module remove darshan
+    ```
+
+### Location of Darshan profile logs
+
+Darshan writes all profile logs to a shared location on the ARCHER2 NVMe Lustre file system. You can
+find your profile logs at:
+
+```
+/mnt/lustre/a2fs-nvme/system/darshan/YYYY/MM/DD
+```
+
+where `YYYY/MM/DD` correspond to the date on which your job ran.
+
+### Analysing Darshan profile logs
+
+The simplest way to analyse the profile log files is to use the `darshan-parser` utility on the 
+ARCHER2 login nodes. You make the Darshan analysis utilities available with the command:
+
+```
+module load darshan-util
+```
+
+Once this is loaded, you can produce and IO performance summary from a profile log file with:
+
+```
+darshan-parser --prof /path/to/darshan/log/file.darshan
+```
+
+You can get a dump of all data in the Darshan profile log by omitting the `--perf` option, e.g.:
+
+```
+darshan-parser /path/to/darshan/log/file.darshan
+```
+
+!!! tip
+    The `darshan-job-summary.pl` and `darshan-summary-per-file.sh` utilities do not work on ARCHER2
+    as the required graphical packages are not currently available.
+
+Documentation on the Darshan analysis utilities are available at:
+
+- [darshan-util documentation](https://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html)
+- [PyDarshan module - Python interface to analyses Darshan profile logs](https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/index.html)
+
diff --git a/docs/data-tools/index.md b/docs/data-tools/index.md
@@ -9,6 +9,7 @@ by third-parties rather than the ARCHER2 service are marked with *):
 - [AMD &mu;Prof](amd-uprof.md): Profiling tools provided by AMD
 - [Arm Forge](arm-forge.md): Provides debugging and profiling tools for MPI parallel applications, and
 OpenMP or pthreads mutli-threaded applications (and also hydrid MPI/OpenMP)
+- [Darshan](darshan.md): Lightweight IO characterisation and profiling tool
 - [Energy Counters](pm-mpi-lib.md): MPI-based library for reading energy counters
 - [Julia(*)](julia.md): The julia language
 - [ParaView](paraview.md): A data visualisation and analysis package 

diff --git a/docs/user-guide/profile.md b/docs/user-guide/profile.md
@@ -6,7 +6,10 @@ CrayPat-lite and CrayPat. We also show how to get usage data
 on currently running jobs from Slurm batch system.
 
 You can also use [the Arm Forge tool](../data-tools/arm-forge.md)
-to profile applications on ARCHER2
+to profile applications on ARCHER2.
+
+If you are specifically interested in profiling IO, then you
+may want to look at the [Darshan IO profiling tool](../data-tools/darshan.md).
 
 ## CrayPat-lite
 
@@ -571,3 +574,10 @@ The AMD &mu;Prof tool provides capabilities for low-level profiling on AMD proce
 The Arm Forge tool also provides profiling capabilities. See:
 
 - [ARCHER2 Arm Forge documentation](../data-tools/arm-forge.md)
+
+## Darshan IO profiling
+
+The Darshan lightweight IO profiling tool provides a quick way to profile the IO
+part of your software:
+
+- [Using Darshan on ARCHER2](../data-tools/darshan.md)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -119,6 +119,7 @@ nav:
     - "Overview": data-tools/index.md
     - "AMD uProf": data-tools/amd-uprof.md
     - "Arm Forge": data-tools/arm-forge.md
+    - "Darshan": data-tools/darshan.md
     - "Energy Counters": data-tools/pm-mpi-lib.md
     - "Julia": data-tools/julia.md
     - "ParaView": data-tools/paraview.md