Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive building for debugging #117

Merged
merged 23 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cca0183
Initial documentation on how to debug failed builds interactively
Oct 18, 2023
91f5d62
Added build instructions after prep instructions. Added prerequisites…
Oct 19, 2023
3dd417d
Moved contributing software pages to their own top level header to ma…
Oct 20, 2023
f41362b
Fixed typo
Oct 20, 2023
de32bbe
Fix links
Oct 20, 2023
2fbdfec
We now get the EESSI version from the container environment, so this …
Oct 20, 2023
abe13f9
Swap order, as sourcing overwrites the EASYBUILD_INSTALLPATH otherwise
Oct 23, 2023
59de2b8
Added instructions on using and flags for container in order to sav…
Oct 23, 2023
187db42
Clarified description on why to use --save and --resume with eessi co…
Oct 23, 2023
3b4f4ed
Expand explaination on using --save, as it only saves if you actually…
Oct 23, 2023
c8c91f1
Fix installpath that should be displayed by --show-config to reflect …
Oct 23, 2023
e2a5cb7
Take out reference to my user login from the example
Oct 23, 2023
c487e66
Added instructions to add EASYBUILD_INSTALLPATH/modules/all to the MO…
Oct 23, 2023
76d6ad7
Strip unnecessary quotes
Oct 23, 2023
76d1056
Simplify instruction a bit by leveraging minimal_eessi_env to load se…
Oct 25, 2023
1320c09
Forgot to remove one line in previous commit
Oct 25, 2023
8d357f6
Processed most of Thomas' comments
Oct 25, 2023
54034e1
Changed doc structure according to Thomas' comments
Oct 25, 2023
cf854e0
Fixed broken links due to moving stuff around
Oct 25, 2023
4a80085
Indicate who these docs are for, just like the rest in this tree
Oct 26, 2023
06ac2ea
Took two more of Thomas' comments into account
Oct 30, 2023
24fafa6
Make instructions more similar to what the bot does by installing in …
Oct 30, 2023
e497843
Changed based on Tim's review. Various typos. Moved the suggestion fo…
Nov 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/bot.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Building, testing, and deploying software is done by one or more *bot instances*
The EESSI build-test-deploy bot :robot: is implemented as a [GitHub App](https://docs.github.com/en/apps/overview)
in the [`eessi-bot-software-layer` repository](https://github.com/EESSI/eessi-bot-software-layer).

It operates in the context of [pull requests](software_layer/adding_software.md#software_layer_pull_request) to
It operates in the context of [pull requests](contributing_sw/adding_software.md#software_layer_pull_request) to
the [`compatibility-layer` repository](https://github.com/EESSI/compatibility-layer) or the
[`software-layer` repository](https://github.com/EESSI/software-layer),
and follows the instructions supplied by humans,
Expand Down Expand Up @@ -61,7 +61,7 @@ to trigger building of software, and to deploy software installations in to the
## Building { #building }

To instruct the bot :robot: to build software, one or more `build` instructions
should be issued by posting a comment in the pull request (see also [here](software_layer/adding_software.md#bot_build)).
should be issued by posting a comment in the pull request (see also [here](contributing_sw/adding_software.md#bot_build)).

The most basic build instruction that can be sent to the bot is:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ To add software to EESSI, you should go through the semi-automatic software inst
* 1) Making a pull request to the [software-layer](https://github.com/EESSI/software-layer) repository
to (add or) update an [easystack file](https://docs.easybuild.io/easystack-files) :books: that is used by
[EasyBuild](https://docs.easybuild.io/) to install software;
* 2) Instructing the [bot :robot:](../bot.md) to build the software on all [supported CPU microarchitectures](cpu_targets.md);
* 2) Instructing the [bot :robot:](../bot.md) to build the software on all [supported CPU microarchitectures](../software_layer/cpu_targets.md);
* 3) Instructing the [bot :robot:](../bot.md) to deploy the built software for ingestion into the EESSI repository;
* 4) Merging the pull request once CI indicates that the software has been ingested. :white_check_mark:

Expand Down Expand Up @@ -108,7 +108,7 @@ For more information, see the [building section in the bot documentation](../bot

* If one of the builds failed, you can let the bot retry that specific build.

* Make sure that the software has been built correctly for all [CPU targets](cpu_targets.md) before you deploy!
* Make sure that the software has been built correctly for all [CPU targets](../software_layer/cpu_targets.md) before you deploy!

#### Checking the builds :mag:

Expand Down
180 changes: 180 additions & 0 deletions docs/contributing_sw/debugging_failed_builds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Debugging failed builds

Unfortunately, software does not always build successfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet.

In EESSI, the build are performed by a bot. This is great for builds that complete successfully as we can build a lot of software, for a wide range of hardware because of this automation. However, it does means that you, as contributor, can not easily access the build directory and build logs to figure out build issues.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

This page describes how you can interactively reproduce failed builds, so that you can more easily debug the issue.

Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It builds LAMMPS, and failed (among other things) on a [build issue for Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105).
casparvl marked this conversation as resolved.
Show resolved Hide resolved

## Prerequisites
You will need to have:

- Access to a machine with the hardware for which the build that you want to debug failed.
- On that machine, meet the requirements for running the EESSI container, as described on [this page](../getting_access/eessi_container.md#prerequisites).

## Preparing the environment
A number of steps are needed to create the same environment in which the bot builds.

- Fetching the feature branch from which you want to replicate a build.
- Starting a shell in the EESSI container.
- Start the Gentoo Prefix environment.
- Start the EESSI software environment.
- Configure EasyBuild.

### Fetching the feature branch
Looking at [the example PR](https://github.com/EESSI/software-layer/pull/360), we see the PR is created from [this fork](https://github.com/laraPPr/software-layer/). First, we clone the fork, then checkout the feature branch (`LAMMPS_23Jun2022`)
```
git clone https://github.com/laraPPr/software-layer/
cd software-layer
git checkout LAMMPS_23Jun2022
```
Alternatively, if you already have a clone of the `software-layer` you can add it as a new remote
```
cd software-layer
git remote add laraPPr https://github.com/laraPPr/software-layer/
git fetch laraPPr
git checkout LAMMPS_23Jun2022
```

### Starting a shell in the EESSI container
Simply run the EESSI container (`eessi_container.sh`), which should be in the root of the `software-layer` repository
```
./eessi_container.sh
```
!!! Note
You may have to press enter to clearly see the prompt as some messages
beginning with `CernVM-FS: ` have been printed after the first prompt
`Apptainer> ` was shown.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. E.g.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

```
./eessi_container.sh --save ${HOME}/pr370/
casparvl marked this conversation as resolved.
Show resolved Hide resolved
```
The tarball will be saved when you exit the container. Note that the first `exit` command will first make you exit the Gentoo prefix environment. Only the second will take you out of the container, and print where the tarball will be stored:
```
[EESSI pilot 2023.06] $ exit
logout
Leaving Gentoo Prefix with exit status 1
Apptainer> exit
exit
Saved contents of tmp directory '/tmp/eessi.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz')
```

Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available.
```
./eessi_container.sh --resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz
casparvl marked this conversation as resolved.
Show resolved Hide resolved
```

For more info on using the EESSI container, see [here](../getting_access/eessi_container.md).
casparvl marked this conversation as resolved.
Show resolved Hide resolved

### Start the Gentoo Prefix environment
The next step is to start the Gentoo Prefix environment.

Before we start, check the current values of `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` so that you can reset them later:
```
echo ${EESSI_CVMFS_REPO}
echo ${EESSI_PILOT_VERSION}
```

To do that, you need to run the `startprefix` command. However, we have several compatibility layers, and you'll need to run it for the one that matches the host node. For example, on an aarch64 (ARM) linux machine:
```
export EESSI_OS_TYPE=linux
export EESSI_CPU_FAMILY=aarch64
casparvl marked this conversation as resolved.
Show resolved Hide resolved
${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix
```

if you are unsure, you can start the EESSI software environment (see next step) and check the values of `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` set by that initialization script. Note that you'll have to start over with a new shell (i.e. quit the container) and repeat the current step of starting the Gentoo Prefix environment, as the order of those two steps matters.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

Now, reset the `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` in your prefix environment
casparvl marked this conversation as resolved.
Show resolved Hide resolved
```
export EESSI_CVMFS_REPO=...
export EESSI_PILOT_VERSION=...
```
casparvl marked this conversation as resolved.
Show resolved Hide resolved

!!! Note
By activating the Gentoo Prefix environment, the system tools (e.g. `ls`) you would normally use are now provided by Gentoo Prefix, instead of the container OS. E.g. running `which ls` after starting the prefix environment as above will return `/cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/x86_64/bin/ls`. This makes the builds completely independent from the container OS.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

### Starting the EESSI software environment
!!! Note
If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${EESSI_CPU_FAMILY}/generic` before starting the EESSI environment.

To activate the software environment, run
```
source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash
```

!!! Note
If you get an error `bash: /versions//init/bash: No such file or directory`, you forgot to reset the `${EESSI_CVFMS_REPO}` and `${EESSI_PILOT_VERSION}` environment variables at the end of the previous step.
casparvl marked this conversation as resolved.
Show resolved Hide resolved


For more info on starting the EESSI software environment, see [here](../using_eessi/setting_up_environment.md)

### Configure EasyBuild
It is important that we configure EasyBuild in the same way as the bot uses it, with two small exceptions:

- Our working directory will be different
- Our installpath will be different
casparvl marked this conversation as resolved.
Show resolved Hide resolved

For both, any writeable path will do. In this example, we will choose `/tmp/easybuild` as our workdir, and `$HOME/.local/easybuild` as our installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables.
casparvl marked this conversation as resolved.
Show resolved Hide resolved
casparvl marked this conversation as resolved.
Show resolved Hide resolved

```
export WORKDIR=/tmp/easybuild
casparvl marked this conversation as resolved.
Show resolved Hide resolved
source configure_easybuild
export EASYBUILD_INSTALLPATH="/tmp/easybuild"
```
Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly:
```
module load EasyBuild/4.8.1
eb --show-config
```
You should get something similar to

```
#
# Current EasyBuild configuration
# (C: command line argument, D: default value, E: environment variable, F: configuration file)
#
buildpath (E) = /tmp/easybuild/easybuild/build
containerpath (E) = /tmp/easybuild/easybuild/containers
debug (E) = True
experimental (E) = True
filter-deps (E) = Autoconf, Automake, Autotools, binutils, bzip2, DBus, flex, gettext, gperf, help2man, intltool, libreadline, libtool, Lua, M4, makeinfo, ncurses, util-linux, XZ, zlib, Yasm
filter-env-vars (E) = LD_LIBRARY_PATH
hooks (E) = ${HOME}/software-layer/eb_hooks.py
ignore-osdeps (E) = True
installpath (E) = /tmp/easybuild/software/linux/aarch64/neoverse_n1
module-extensions (E) = True
packagepath (E) = /tmp/easybuild/easybuild/packages
prefix (E) = /tmp/easybuild/easybuild
read-only-installdir (E) = True
repositorypath (E) = /tmp/easybuild/easybuild/ebfiles_repo
robot-paths (D) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/EasyBuild/4.8.1/easybuild/easyconfigs
rpath (E) = True
sourcepath (E) = /tmp/easybuild/easybuild/sources:
sysroot (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/aarch64
trace (E) = True
zip-logs (E) = bzip2
```

!!! Note
If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC`.

## Building the software
When the bot builds software, it loops over all EasyStack files that have been changed, and builds them using EasyBuild. However, a single PR may add multiple items to a single EasyStack file, and the issue you are trying to debug is probably in _one_ of them. Getting EasyBuild to build the full EasyStack file will create the most similar situation to what the bot does. However, you _may_ just want to build the individual software that has changed. Below, we describe both approaches.

### Building everything in the EasyStack file
In our [example PR](https://github.com/EESSI/software-layer/pull/360), the EasyStack file that was changed was `eessi-2023.06-eb-4.8.1-2021b.yml`. To build this, we run (in the directory that contains the checkout of this feature branch):
```
eb --easystack eessi-2023.06-eb-4.8.1-2021b.yml --robot
```
After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

### Building an individual package
casparvl marked this conversation as resolved.
Show resolved Hide resolved
In our [example PR](https://github.com/EESSI/software-layer/pull/360), the individual package that was added to `eessi-2023.06-eb-4.8.1-2021b.yml` was `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`. We'll also have to mind any options that are listed in the EasyStack file for `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`, in this case the option `--from-pr 19000`. Thus, to build, we run:
casparvl marked this conversation as resolved.
Show resolved Hide resolved
```
eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000
```
After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed.
casparvl marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion docs/support.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Note that we can only help with problems related to the software *installations*

We are open to software requests for software that is not included in EESSI yet.

The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](software_layer/adding_software.md).
The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](contributing_sw/adding_software.md).

Alternatively, you can send in a request to our support team. Please try to provide as much information on the software as possible: preferably use the [issue template](https://gitlab.com/eessi/support/-/issues/new?issuable_template=Software_request) (which requires you to log in to GitLab), or make sure to cover the items listed [here](https://gitlab.com/eessi/support/-/blob/main/.gitlab/issue_templates/Software_request.md).

Expand Down
8 changes: 7 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ nav:
- Overview: software_layer.md
- software_layer/cpu_targets.md
- software_layer/build_nodes.md
- software_layer/adding_software.md
- Test suite:
- Overview: test-suite/index.md
- Installation & configuration: test-suite/installation-configuration.md
Expand All @@ -41,6 +40,13 @@ nav:
- using_eessi/setting_up_environment.md
- using_eessi/basic_commands.md
- using_eessi/eessi_demos.md
- Contributing software to EESSI:
casparvl marked this conversation as resolved.
Show resolved Hide resolved
# Todo: insert an overview page with a flowchart showing the high level process
# - Overview: contributing_sw/overview.md
- contributing_sw/adding_software.md
- contributing_sw/debugging_failed_builds.md
# Todo: write on how to contribute to the EESSI test suite
# - Contributing software tests to the EESSI test suite:
- Getting support: support.md
- Meetings:
- Overview: meetings.md
Expand Down