Skip to content

Commit

Permalink
Merge branch 'docs' of https://github.com/EPCCed/SiMLInt into docs
Browse files Browse the repository at this point in the history
  • Loading branch information
aroubickova committed Nov 1, 2023
2 parents 0bf0d7a + 8ae998c commit e7d4e1e
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 27 deletions.
2 changes: 1 addition & 1 deletion docs/ML_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Both the data generation as well as the model construction and training is very

*Work in progress.*

[back](./)
[< Back](./)
29 changes: 14 additions & 15 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,49 @@
SiMLInt is an [ExCALIBUR](https://excalibur.ac.uk/) project demonstrating how to integrate Machine Learning (ML) to physics simulations. It combines commonly used, open-source tools and few in-house Python scripts to execute ML-aided computational fluid dynamics simulations. This page explains how to set-up the workflow to apply the same techniques to other simulations.
**SiMLInt is an [ExCALIBUR](https://excalibur.ac.uk/) project demonstrating how to integrate Machine Learning (ML) to physics simulations. It combines commonly used, open-source tools with in-house Python scripts to execute ML-aided computational fluid dynamics simulations.**


## Codes and Dependencies
The workflow consists of a simulation code implemented in a suitable domain solver, a trained ML model that supplements or adjusts part of the computation, and a layer that orchestrates the former two tools.

Our example workflow uses the following tools:
* [BOUT++](https://boutproject.github.io), written in C++ and Python, as the fluid dynamics simulation code
* [TensorFlow](https://www.tensorflow.org/) (through [Keras](https://keras.io)) to develop, and train the ML model as well as for the ML inference
* [SmartSim](https://github.com/CrayLabs/SmartSim), using SmartRedis in-memory database, handles the communication between the simulation code and the ML model
* [SmartSim](https://github.com/CrayLabs/SmartSim), using SmartRedis in-memory database, to handle the coordination and communication between the simulation code and the ML model

In order to set up the workflow, you first need to install these tools in the [versions suitable for SmartSim](https://www.craylabs.org/docs/installation_instructions/basic.html#supported-versions).
To reproduce our work, the system needs to have installed these tools in the [versions suitable for SmartSim](https://www.craylabs.org/docs/installation_instructions/basic.html#supported-versions).
For this step, it is best to follow the tool's installing instructions; however, we provide an example step-by-step and expected outcomes at each stage for installing these on [Cirrus](https://www.cirrus.ac.uk).

[Example installation on Cirrus](./example-installation.md)
[> Example installation on Cirrus](./example-installation.md)

## Workflow

SiMLInt workflow is currently based on [Learned Correction](https://www.pnas.org/doi/full/10.1073/pnas.2101784118) (LC).
SiMLInt workflow is currently based on [Learned Correction](https://www.pnas.org/doi/full/10.1073/pnas.2101784118) (LC):

The numerical solver is used to simulate a system, with adapted parameters so that the system is under-resolved due to the domain being decomposed to a coarser level than would be optimal. Beyond this, the execution of the simulation remains unchanged and can be parallelised as usual.

The coarse granularity of the domain decomposition means the simulation would diverge from the real evolution of the system. To prevent this, the workflow uses a pretraind ML model to adjust the grid at every step of the simulation, keeping the system on the right track.

The ML model is often based on a convolutional neural network (CNN), and is trained to predict the difference between the coarse simulation step and a fully-resolved state, coarsened to match the grid dimensions. Notably, the workflow can be used in a parallelised scenario, where the the correction inference is performed in each parallel process separately, using the partial domain as the ML model's input, and using the model's prediction to correct only that slice.


The diagram below visualises the workflow. The numerical simulation, run in BOUT++, is represented by the black squares and grids, while the Learned Correction loop is realised in SmartSim by calling a TensorFlow model, which returns the correction (orange grid).

![SiMLInt workflow](./assets/SiMLInt_workflow.pdf)


We demonstrate the workflow on the Hasegawa-Wakatani set of equations using a dummy ML-model which does not affect the simulation. This allows the users to test that the set-up works and returns the expected results.

[Detailed instructions](./workflow.md)
[> Detailed instructions](./workflow.md)

## Model training

The example workflow uses a model that returns always 0s for the correction, maintaining the simulation on the same trajectory it would follow without any ML adjustments. The idea of Learned Correction however requires a model that is trained to predict the difference between the fully resolved trajectory that runs over a sufficiently fine resolution of the domain and a trajectory that uses coarser domain decomposition (and coarser time steps). To obtain a suitable ML model, we need to generate training data and use it to train the model.
The example workflow uses a model that returns always 0s for the correction, maintaining the simulation on the same trajectory it would follow without any ML adjustments. LC, however, requires a model that is trained to predict the difference between the fully resolved trajectory that runs over a sufficiently fine resolution of the domain and a trajectory that uses coarser domain decomposition (and coarser time steps). To train such a ML model, we need to generate data matching this scenario, as detailed below.

The data generation schema below outlines the kind of data we need to collect for the model training --- we need to:
1. run a fully resolved simulation (denoted F)
2. coarsen some points on the fine trajectory (denoted C) -- these are *inputs* for the training process
3. make a coarse simulation step from C (denoted by the arrow labelled ∆t_c)
4. calculate the difference between the fully resolved, coarsened grid and the coarse grid (denoted Ĉ at the equivalent simulation step -- this is the *target* to train the model for
The training dataset can be generated as follows:
1. Run a fully resolved simulation (denoted F)
2. Coarsen some timepoints on this fine trajectory (denoted C) -- these are *inputs* for the ML training
3. Make a coarse simulation step from C (denoted by the arrow labelled ∆t_c)
4. Calculate the difference between the fully resolved, coarsened grid and the coarse grid (denoted Ĉ) at the equivalent simulation step -- this is the *target* in the ML training

![Data Generation](./assets/data_generation_schema.pdf)

The dataset we have created for the Hasegawa-Wakatani example, based on 32,000 fully resolved points and, in the coarsened state prepared for the ML training, taking XXX GB, is available on request.

[Implementation details](./ML_training.md)
[> Implementation details](./ML_training.md)
78 changes: 68 additions & 10 deletions docs/example-installation.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,81 @@
This page shows 1. how to install BOUT++ on Cirrus, and 2. how to install and set-up SmartSim so that in can communicate with BOUT++.
This page shows
1. how to [install BOUT++](./example-installation.md#1-bout) on Cirrus, and
2. how to [install and set-up SmartSim](./example-installation.md#2-smartsim-with-bout) so that it can communicate with BOUT++.

[back](./)
[< Back](./)

# 1. BOUT++

(copy from gitlab)
Go to the `/work` filesystem:
```
export WORK=/work${HOME#/home}
cd $WORK
```

[back](./)
Download BOUT++ source code:
```
git clone https://github.com/boutproject/BOUT-dev.git
```

# 2. SmartSim with BOUT++
> We are using v5.0.0; you can download this specific version like this:
>
> ```wget https://github.com/boutproject/BOUT-dev/archive/refs/tags/v5.0.0.tar.gz```
Load required modules:
```
module load mpt
module load intel-compilers-19
module load fftw/3.3.10-intel19-mpt225
module load netcdf-parallel/4.6.2-intel19-mpt225
module load cmake
```

Create a Python `venv` extending the central `python/3.9.13` module, since the BOUT++ build requires Python with additional packages (Cython, zoidberg, boututils and others) which are not provided by `python/3.9.13`:
```
export HOME=$WORK # optional, see the note below
module load python/3.9.13
python -m venv --system-site-packages bout
extend-venv-activate bout
source bout/bin/activate
python -m pip install cython
```

Following the first run of the above, simply `source bout/bin/activate` is enough.

## Python/conda stuff
Follow [Cirrus docs](https://docs.cirrus.ac.uk/user-guide/python/#installing-your-own-python-packages-with-conda) to set up a python environment to which further packages can be added.
> **Note:** The Python `venv` module expects the venv parent directory to be `$HOME`, i.e. venv folders are in `$HOME/<venv name>`.
> If `export HOME=$WORK` is not used, full paths must be given to `venv`, for example, `python -m venv --system-site-packages $WORK/bout`.
> This isn't a big deal at this stage, but is more important when running SiMLInt Jupyter Notebooks.
Build:
```
cd $WORK/BOUT-dev
MPICXX_CXX=icpc MPICXX=mpicxx cmake . -B build -DBOUT_DOWNLOAD_NETCDF_CXX4=ON -DBOUT_USE_LAPACK=off -DCMAKE_CXX_FLAGS=-std=c++17 -DCMAKE_BUILD_TYPE=Release
export PYTHONPATH=$WORK/BOUT-dev/build/tools/pylib:$WORK/BOUT-dev/tools/pylib:$PYTHONPATH
# This may not be not required
cmake --build build -j 6
```

## BOUT++ Hasegawa-Wakatani example
This will build a *pure* BOUT++ version of the Hasegawa-Wakatani example. A build with SmartSim connection capability is described on the [workflow page](./workflow.md#compile-hasegawa-wakatani-with-smartredis).

Still in `$WORK/BOUT-dev`:
```
MPICXX_CXX=icpc MPICXX=mpicxx cmake . --build build -DBOUT_BUILD_EXAMPLES=on
cmake --build build --target hasegawa-wakatani
```

## Add packages
[< Back](./)


# 2. SmartSim with BOUT++

## Python/conda environment
Follow [Cirrus docs](https://docs.cirrus.ac.uk/user-guide/python/#installing-your-own-python-packages-with-conda) to set up a python environment to which further packages can be added. We refer to this environment `myvenv`.

Add the following packages to install SmartSim ML wrapper:
```
conda activate myvenv
conda install git-lfs
Expand All @@ -26,7 +85,6 @@ python -m pip install smartsim[ml]
```

Build:

```
module load mpt
module load intel-compilers-19
Expand All @@ -48,4 +106,4 @@ make lib
The install path is then available in `smartredis/install`. Modify the `CMakeLists.txt` file to point to this path on your system in place of `/work/x01/x01/auser/smartsim/smartredis/install/include` on line 12.


[back](./)
[< Back](./)
8 changes: 7 additions & 1 deletion docs/workflow.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# SiMLInt Workflow

The system needs to have all the tools and packages (in suitable versions) installed. See the main page and the example of installation for help.

The example workflow described here does not require a pre-trained ML model, we are using a placeholder model that alwyas returns 0s to showcase the framework, and the script is provided here. Obviously, any other model can be exported in the desired format and used in the workflow.

[< Back](./)

## Export the ML model

Activate the conda environment with SmartSim (see Cirrus example to make sure it has all relevant packages)
Expand Down Expand Up @@ -73,4 +79,4 @@ cd my-bout-smartsim-hw
An example script that can be used on Cirrus can be found in [files/run_SmartSim/submit-hw.sh](https://github.com/EPCCed/SiMLInt/blob/docs/files/run_SmartSim/submit-hw.sh)
This slurm job file starts the SmartSim orchestrator (in Python) with a Redis database and RedisAI communication layer. In this example, the Redis DB runs on the same node since the simulation only runs in one process.

[back](./)
[< Back](./)

0 comments on commit e7d4e1e

Please sign in to comment.