Skip to content

Commit

Permalink
Merge pull request #564 from aturner-epcc/aturner-epcc/darshan-docs
Browse files Browse the repository at this point in the history
Adds Darshan documentation
  • Loading branch information
juanfrh authored Dec 11, 2023
2 parents e170705 + 2294afb commit 499000a
Show file tree
Hide file tree
Showing 4 changed files with 111 additions and 1 deletion.
98 changes: 98 additions & 0 deletions docs/data-tools/darshan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Darshan

Darshan is a scalable HPC I/O characterization tool. Darshan is designed to capture
an accurate picture of application I/O behavior, including properties such as patterns
of access within files, with minimum overhead. The name is taken from a Sanskrit
word for "sight" or "vision".

Darshan is developed at the [Argonne Leadership Computing Facility (ALCF)](https://www.alcf.anl.gov/)

Useful links:

- [Darshan home page](https://www.mcs.anl.gov/research/projects/darshan/)
- [Darshan documentation](https://www.mcs.anl.gov/research/projects/darshan/documentation/)

## Using Darshan on ARCHER2

Using Darshan generally consists of two stages:

1. Collect IO profile data using the Darshan runtime
2. Analysing Darshan log files using Darshan utility software

### Collecting IO profile data

To collect IO profile data you add the command:


```
module load darshan
```

to your job submission script as the **last** `module` command before you run your program. As Darshan
does not distinguish between different software run in your job submission script, we typically
recommand that you use a structure like:

```
module load darshan
srun ...usual software launch options...
module remove darshan
```

This will avoid Darshan profiling IO for operations that are not part of your main parallel program.

!!! important
The `darshan` module is dependent on the compiler environment you are using and you should ensure
that you load the `darshan` module that matches the compiler environment you used to compile the
program you are analysing. For example, if your software was compiled using `PrgEnv-gnu`, then you
would need to activate the GCC compiler environment before loading the `darshan` module to ensure you
get the GCC version of Darshan. This means loading the correct `PrgEnv-` module before you load the
`darshan` module:

```
module load PrgEnv-gnu
module load darshan
srun ...usual software launch options...
module remove darshan
```

### Location of Darshan profile logs

Darshan writes all profile logs to a shared location on the ARCHER2 NVMe Lustre file system. You can
find your profile logs at:

```
/mnt/lustre/a2fs-nvme/system/darshan/YYYY/MM/DD
```

where `YYYY/MM/DD` correspond to the date on which your job ran.

### Analysing Darshan profile logs

The simplest way to analyse the profile log files is to use the `darshan-parser` utility on the
ARCHER2 login nodes. You make the Darshan analysis utilities available with the command:

```
module load darshan-util
```

Once this is loaded, you can produce and IO performance summary from a profile log file with:

```
darshan-parser --prof /path/to/darshan/log/file.darshan
```

You can get a dump of all data in the Darshan profile log by omitting the `--perf` option, e.g.:

```
darshan-parser /path/to/darshan/log/file.darshan
```

!!! tip
The `darshan-job-summary.pl` and `darshan-summary-per-file.sh` utilities do not work on ARCHER2
as the required graphical packages are not currently available.

Documentation on the Darshan analysis utilities are available at:

- [darshan-util documentation](https://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html)
- [PyDarshan module - Python interface to analyses Darshan profile logs](https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/index.html)

1 change: 1 addition & 0 deletions docs/data-tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ by third-parties rather than the ARCHER2 service are marked with *):
- [AMD μProf](amd-uprof.md): Profiling tools provided by AMD
- [Arm Forge](arm-forge.md): Provides debugging and profiling tools for MPI parallel applications, and
OpenMP or pthreads mutli-threaded applications (and also hydrid MPI/OpenMP)
- [Darshan](darshan.md): Lightweight IO characterisation and profiling tool
- [Energy Counters](pm-mpi-lib.md): MPI-based library for reading energy counters
- [Julia(*)](julia.md): The julia language
- [ParaView](paraview.md): A data visualisation and analysis package
Expand Down
12 changes: 11 additions & 1 deletion docs/user-guide/profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ CrayPat-lite and CrayPat. We also show how to get usage data
on currently running jobs from Slurm batch system.

You can also use [the Arm Forge tool](../data-tools/arm-forge.md)
to profile applications on ARCHER2
to profile applications on ARCHER2.

If you are specifically interested in profiling IO, then you
may want to look at the [Darshan IO profiling tool](../data-tools/darshan.md).

## CrayPat-lite

Expand Down Expand Up @@ -571,3 +574,10 @@ The AMD μProf tool provides capabilities for low-level profiling on AMD proce
The Arm Forge tool also provides profiling capabilities. See:

- [ARCHER2 Arm Forge documentation](../data-tools/arm-forge.md)

## Darshan IO profiling

The Darshan lightweight IO profiling tool provides a quick way to profile the IO
part of your software:

- [Using Darshan on ARCHER2](../data-tools/darshan.md)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ nav:
- "Overview": data-tools/index.md
- "AMD uProf": data-tools/amd-uprof.md
- "Arm Forge": data-tools/arm-forge.md
- "Darshan": data-tools/darshan.md
- "Energy Counters": data-tools/pm-mpi-lib.md
- "Julia": data-tools/julia.md
- "ParaView": data-tools/paraview.md
Expand Down

0 comments on commit 499000a

Please sign in to comment.