Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list broken modules #690

Draft
wants to merge 2 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

MartinsNadia
Copy link

Add script to list broken modules using --module-only

Copy link

eessi-bot bot commented Sep 2, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Sep 2, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@ocaisa
Copy link
Member

ocaisa commented Sep 11, 2024

@MartinsNadia This has two things inside, the missing modules script and the licencing PR, these need to be separated out.

declare -A checked_modules

# Locate all .eb files within the base dir
easyconfig_files=$(find $base_dir -name "*.eb")
Copy link
Member

@ocaisa ocaisa Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a hammer depending on how this is used. It will run on the entire stack, which can take a long time as sanity checks are also run when using --module-only (but not if coupled with --force I think)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we can selectively do this for easyconfigs that are introduced by a PR.

One way to do this is to gather that information using --missing based on the easy stack files touched by a PR.

As for implementation, to me this would have to be done as part of the build script:

  • Gather the easyconfigs to be installed using --missing on the easystack file touched by the PR (which is already known in the build script)
  • Do the builds as currently implemented
  • Go through the missing easyconfig list with --module-only and the other necessary options, compare the two modules and make sure that they are identical

Copy link
Member

@ocaisa ocaisa Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I mean targeting the loop

if [ -f ${easystack_file} ]; then
echo_green "Feeding easystack file ${easystack_file} to EasyBuild..."
${EB} --easystack ${TOPDIR}/${easystack_file} --robot
ec=$?
# copy EasyBuild log file if EasyBuild exited with an error
if [ ${ec} -ne 0 ]; then
eb_last_log=$(unset EB_VERBOSE; eb --last-log)
# copy to current working directory
cp -a ${eb_last_log} .
echo "Last EasyBuild log file copied from ${eb_last_log} to ${PWD}"
# copy to build logs dir (with context added)
copy_build_log "${eb_last_log}" "${build_logs_dir}"
fi
$TOPDIR/check_missing_installations.sh ${TOPDIR}/${easystack_file} ${TOPDIR}/${pr_diff}
else
fatal_error "Easystack file ${easystack_file} not found!"
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So add a step before 279 to gather the easyconfig files that will be built, then add an else clause to L283 (i.e., when the build succeeds) which rebuilds the module and checks the module contents against the original

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't catch the existing software that fail --module-only but it will stop other being added in the future. You're existing script could (almost) be used to catch existing failures (but it is currently not checking the contents for the module to make sure they match).

echo "Testing module: $module_name"

# Try loading the module
if module --ignore_cache load $module_name 2>/dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking if the module can be loaded, but we probably want more than that, we want to check that the resulting environment (apart from Lmod variables) is the same as when the original module was loaded.

Copy link
Member

@ocaisa ocaisa Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this easily enough on a module by module basis, just by changing the MODULEPATH:

  • module purge; module unuse $module_install_dir to reset for the next module
  • Load the original module, 'module --ignore_cache load $module_name'
  • store the environment using env, filtering out Lmod related content
  • module use $module_install_dir
  • reload the module (making sure it comes from where you expect)
  • Store the new environment (also filtering)
  • Make sure they are identical

Comment on lines 52 to 54
for module_category in $(ls $module_install_dir/all); do
for module_version in $(ls $module_install_dir/all/$module_category); do
module_name="$module_category/$module_version"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for module_category in $(ls $module_install_dir/all); do
for module_version in $(ls $module_install_dir/all/$module_category); do
module_name="$module_category/$module_version"
for module_software in $(ls $module_install_dir/all); do
for module_version in $(ls $module_install_dir/all/$module_software); do
module_name="$module_software/$module_version"

@@ -0,0 +1,57 @@
name: Check and update licenses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still mixing up two PRs, the one for licenses and the one for checking --module-only

@@ -0,0 +1,203 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? Have you got sample output?

adding PR handling, build and comparison of modules and envs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants