Releases: stackhpc/ansible-slurm-appliance
v1.157
What's Changed
- Update ceph to use ark packages and move RL9 to ceph reef by @wtripp180901 in #519
- Add more information re. configuring production sites by @sjpb in #508
- Change defaults so a cookiecutter environment is fully functional by @wtripp180901 in #473
- Fix epel not using Ark repos for RL8 by @wtripp180901 in #526
- Fix volume_backed_instances not working for compute nodes by @sjpb in #527
- Generate and persist hostkeys for ondemand and login nodes by @wtripp180901 in #525
- Support additional volumes on compute nodes by @sjpb in #528
- Support SSSD and optionally LDAP by @sjpb in #438
- Fix nightly cleanup to deal with duplicate server names by @bertiethorpe in #532
- Fix various typos in documentation by @priteau in #530
- Fix environment creation steps by @priteau in #531
- Support and test "re-imageable" compute nodes via compute node metadata by @bertiethorpe in #518
- Document required security groups by @priteau in #534
- Bump Zenith client to latest from azimuth-cloud namespace by @m-bull in #437
- Fix yaml formatting in operations docs by @sjpb in #535
- Enable image builds to install extra packages by default by @sjpb in #536
Image Details
Two new images are available
- RL8: openhpc-RL8-250114-1627-bccc88b5
- RL9: openhpc-RL9-250114-1626-bccc88b5
New Contributors
Full Changelog: v1.156...v1.157
v1.156
What's Changed
Due to the size of this release, PRs are grouped below. In brief:
- This release addresses various breakages caused by changes to upstream repos. As a result, as of this release the StackHPC images (see below) ship with all dnf repos disabled and either credentials for StackHPC's ark server or a local Pulp server mirrored from
ark
are required in order to build images. - OFED and CUDA are no longer shipped in StacHPC images and require an image build to add.
- StackHPC images move to RockyLinux 9.5 and 8.10.
- Added support for NVIDIA DOCA instead of OFED.
- Added support for Lustre clients.
- OpenHPC role supports using the same nodes in multiple partitions/groups.
- Additional packages can be added via
appliances_default_extra_packages
.
Isolation from upstream dnf repos
- Remove CUDA and OFED builds from CI by @bertiethorpe in #479
- Use rocky 9.4 release train snapshots for builds by @wtripp180901 in #486
- Support site Pulp server for image builds by @wtripp180901 in #490
- Pin nvidia-driver and cuda packages to working packages by @sjpb in #496
- Bump RL9.4 repo timestamps to latest snapshots by @wtripp180901 in #497
- Refactor pulp/dnf roles to avoid having to redefine Ark URLs by @wtripp180901 in #507
- Release train support for Rocky 8.10 by @wtripp180901 in #501
- Bump appliance to Rocky 9.5 + release train support by @wtripp180901 in #503
- Fix python/ansible/pulp squeezer versions for RL8 deploy hosts by @sjpb in #516
- Add Release Train OpenHPC repos by @wtripp180901 in #515
New functionality
- Support lustre client by @sjpb in #447
- Install k3s cluster with ansible init by @wtripp180901 in #441
- Make block device detection work on ESXi by @mkjpryor in #481
- Add role to install NVIDIA DOCA on top of an existing "fat" image by @sjpb in #492
- Fix DOCA install cleanup deleteing /tmp by @sjpb in #494
- Add list of additional package installs by @wtripp180901 in #499
- EXPERIMENTAL: add machinery to allow compute nodes to rejoin cluster on reimage by @sjpb in #500
- Ansible-init compute node script by @bertiethorpe in #476
Docs
- Add missing bits re. initial setup to refactored README by @sjpb in #464
- Add generic upgrade docs by @sjpb in #462
- Add note about login node reboot when changing OOD servername by @sd109 in #510
Fixes
- Remove local DNS as a dependency for k3s by @sjpb in #442
- Fix adhoc/rebuild wait_for_connection race condition by @bertiethorpe in #483
- Fix Lustre deleting rdma packages and bump to v2.15.6 for RL9.5 support by @wtripp180901 in #502
Upgrades
- Upgrade RL8 ceph to quincy + trivy rate limit and OOD false positives fix by @wtripp180901 in #477
- Bump openhpc role for slurm restart, templating and nodes in multiple groups by @sjpb in #488
Internal CI changes/fixes
- Don't run trivy scan on nightly builds by @sjpb in #467
- Unset signature_verified property from nightly/latest images by @sjpb in #474
- Don't fail cluster cleanup when prefix not found by @bertiethorpe in #480
- Fix nightly images getting timestamp/git hash by @sjpb in #493
- Fix nightly build version (v2) by @sjpb in #495
- Remove use of FIPs for leafcloud packer builds by @sjpb in #498
Image Details
Two new images are available (neither of which now contain OFED) :
- RL8: openhpc-RL8-250106-0916-f8603056
- RL9: openhpc-RL9-250106-0916-f8603056
New Contributors
Full Changelog: v1.155...v1.156
v1.155
What's Changed
- Prevent ansible-init running during packer build by @wtripp180901 in #439
- Ensure podman copes with a hard reboot by @sjpb in #460
- Add workflow to cleanup CI clusters by @sjpb in #451
Image Details
Three new images are available, all with OFED:
- openhpc-RL8-241022-0441-a5affa58
- openhpc-RL9-241022-0038-a5affa58
- openhpc-cuda-RL9-241022-0441-a5affa58
New Contributors
- @wtripp180901 made their first contribution in #439
Full Changelog: v1.154...v1.155
v1.154
What's Changed
- Add description of image to build by @sjpb in #444
- Nightly Slurm CI Rocky update workflow by @bertiethorpe in #440
- stub s3-image-sync workflow for easier ci by @bertiethorpe in #450
- Upload main images to Arcus S3 and sync clouds by @bertiethorpe in #448
- Update docs to include operations by @sjpb in #422
- Fix error in packer build command for nightly builds by @bertiethorpe in #455
- Bump terraform collection to fix race with waiting for ssh by @sjpb in #457
Image details
Three new images are available, all with OFED:
- openhpc-RL8-241009-1523-354b048a
- openhpc-RL9-241009-1523-354b048a
- openhpc-cuda-RL9-241009-1523-354b048a
These require a 15GB root disk except for the image with CUDA which requires 30GB.
Full Changelog: v1.153.1...v1.154
v1.153.1
What's Changed
- Fix up the outputs, after the fip fix by @JohnGarbutt in #446
Full Changelog: v1.153...v1.153.1
No new images provided at this release.
v1.153
What's Changed
- Add RL9 cuda build variant by @sjpb in #428
- Build RL8+OFED image in CI by @MoteHue in #427
- Dev script - Extract fatimage.yml logs to analyse packer build times by @bertiethorpe in #435
- Enable SMS Labs for CI by @bertiethorpe in #426
- Caas updated to use openstack_networking_floatingip_associate_v2 by @JohnGarbutt in #445
Full Changelog: v1.152...v1.153
Image details
Three new images are available:
- RL8 with OFED: openhpc-ofed-RL8-240906-1042-32568dbb - requires a 15 GB root disk
- RL9 with OFED: openhpc-ofed-RL9-240906-1041-32568dbb - requires a 15 GB root disk
- RL9 with OFED+CUDA: openhpc-cuda-RL9-240906-1041-32568dbb - requires a 30 GB root disk
v1.152
What's Changed
- Update OSes available for deployment by @bertiethorpe in #424
- Correct the -only options in the Packer README by @MoteHue in #423
- Add trivy image scanning by @sjpb in #413
- Enable 'openstack baremetal ...' commands on deploy host by @sjpb in #425
- Automated PRs for version bumps by @bertiethorpe in #429
- Add workflow for fat image uploads to client sites by @bertiethorpe in #430
New Contributors
Full Changelog: v1.151...v1.152
Image details
Two new images are available, both of which require a 15GB root disk:
- RL8 without OFED: openhpc-RL8-240813-1317-1b370a36
- RL9 with OFED: openhpc-ofed-RL9-240813-1317-1b370a36
v1.151
What's Changed
- Add TuneD by @bertiethorpe in #409
- Use shorter names for CI clusters by @sjpb in #415
- Allow items in compute mapping to have different keys by @sjpb in #412
- Move jupyter openondemand installation to fatimage by @bertiethorpe in #414
- Support ansible-init for remote collections by @sjpb in #411
- Avoid python-openstackclient v7 due to rebuild bug by @sjpb in #420
- Update hpctests to obey UCX_NET_DEVICES when RoCE devices present by @bertiethorpe in #421
New Contributors
- @bertiethorpe made their first contribution in #409
Full Changelog: v1.150...v1.151
Image details
Two new images are available, both of which require a 15GB root disk:
- RL8 without OFED: openhpc-RL8-240725-1710-325c7b47
- RL9 with OFED: openhpc-ofed-RL9-240725-1710-325c7b47
v1.150
What's Changed
- Fix squid port default by @sjpb in #405
- Allow extending fat images with site-specific groups by @sjpb in #403
- Remove squid nodes from podman group by @sjpb in #407
- Fix README for RL9 by @sjpb in #408
- Add support for defining groups to basic_users by @sjpb in #406
- Revert to base ssh repos by @sjpb in #410
Full Changelog: v1.149...v1.150
Image details
Two new images are available, both of which require a 15GB root disk:
- RL8 without OFED: openhpc-RL8-240712-1426-6830f97b
- RL9 with OFED: openhpc-ofed-RL9-240712-1425-6830f97b
v1.149
What's Changed
- Add squid role by @sjpb in #401
- Upgrade ssh from SIG/security to fix CVE-2024-6387 by @sjpb in #404
Full Changelog: v1.148...v1.149
Image Details
A new image is available for RL9 only, requiring a 15GB root disk:
- RL9 with OFED: openhpc-ofed-RL9-240621-1308-96959324