Tobias Krug, Tobias Klama, Till HĂĽlder
This project is part of the course High Performance Computing for Machine Intelligence. It is used to evaluate different Open MPI communication schemes. Each scheme implements a different way of yielding an optimal solution to a space navigation problem with asynchronous value iteration. (if more than one processor is involved)
To ease the implementation burden for new schemas, a schema base class is introduced. The actual schema implementations inherit from as depicted in the following UML diagram.
As of now, three schemes are implemented and can be tested via configuration. The following sections introduces the communication layout and mechanisms of the layouts. All schemes operate on configuration specified as .yaml file, whose path has to be given as a command line parameter to the binary. Actual availability of the configuration file is only required on the root node running the rank0 processor. This one broadcasts the configuration after a successful loading and parsing to all other MPI nodoes.
Some schemes rely on local availability of the data sets, these schemes execute the following sub-scheme as referenced above.
This project is implemented using a set of software tools, namely:
- CLion from JetBrains for C++/Python development and LaTeX documentation
- Sublime Merge from Sublime for professional git usage
Concerning infrastructure, the project depends on
- HiDrive from Strato for exchange of measurement files and easy distribution of data sets
and supports
- continuous integration, continuous testing and even continuous deployment via GitHub actions.
Deployment is not activated by default, as it would require a sshkey for the TUM HPC cluster on GitHub. This is considered unsafe and is therefore not realised.
The team worked together in a SCRUM style fashion based on issues and a per-issue branch and merge-request.
The project can be executed using the make commands listed below.
- all
- dummy target to prevent make without target
- setupToolchain
- Setup minimum target toolchain, install packages
- setupHostToolchain
- Setup complete host toolchain, install packages, retrieve latest data set and prepare it for testing
- init
- Initialize the data set on the host machine
- clean
- Remove generated files, build output and related files
- rebuild
- Run a clean build/rebuild of the project
- build
- Run an incremental build of the project
- test
- Execute a local test cycle with build and one iteration.
- testX
- Execute a local test cycle with build and multiple iterations. Use as follows to tun 5 cycles:
make testX nruns=5
- Execute a local test cycle with build and multiple iterations. Use as follows to tun 5 cycles:
- generateDoxygen
- Generate the Doxygen documentation for the project and used libraries.
- documentation
- Generate the PlantUML and measurement graphics used in this readme and the report. Generate the report. Stash all generated files.
- pack
- Prepare a tarball for easy sharing of the project.
- unpack
- Unpack a project tarball retrieved from somewhere else.
- runAllHpcTests
- Execute all TUM HPC standard tests
- runHpcATests
- Execute TUM HPC Class A standard tests
- runHpcBTests
- Execute TUM HPC Class B standard tests
- runHpcMixedTests
- Execute TUM HPC Class Mixed standard tests
- runNucTests
- Execute all NUC standard tests
- runRpiTests
- Execute all Raspberry Pi standard tests
- runLocalTests
- Execute all Local standard tests
- runCITests
- Execute all CI standard tests
This project assumes certain infrastructure to be available on the targets used for testing. First and foremost, that is make. To yield a working installation of the project, you have to execute two steps:
- On your host machine:
- make the complete project available
- execute the following commands from the top-level directory of the project:
sudo apt install make
sudo make setupHostToolchain
- On all your target machines:
- make the complete project available on the target
- log-in via ssh and execute the following commands from the top-level directory of the project:
sudo apt install make
sudo make setupToolchain
- execution time (total, vi)
- iterations until convergence
- memory usage (RAM) (max at rank0; sum, min, max of all nodes)
- quality of VI solution (max norm, l2 norm, MSE)
- data set
- MPI target (TUM HPC Class A, TUM HPC Class B, TUM HPC Class Mixed, NUC cluster, Raspberry Pi cluster)
- MPI scheme
- MPI parameters
- MPI synchronization intervall (cycles)
- MPI processor count (world_size)
- VI parameters
- asynchronous vs. synchronous VI with OpenMP
The below graphs visualize the collected measurement files and their analysis per data set and target.
Runtime VI per com_intervall | Steps per com_interval |
---|---|
Measurement count | Measurement duration |
---|---|
Max-RSS at rank0 per world_size | Sum of Max-RSS of all ranks per world_size |
---|---|
Measurement count | Measurement duration |
---|---|
Max-RSS at rank0 per world_size | Sum of Max-RSS of all ranks per world_size |
---|---|