Skip to content

Releases: stackhpc/ansible-slurm-appliance

v1.157

15 Jan 13:29
5f7e48f
Compare
Choose a tag to compare

What's Changed

  • Update ceph to use ark packages and move RL9 to ceph reef by @wtripp180901 in #519
  • Add more information re. configuring production sites by @sjpb in #508
  • Change defaults so a cookiecutter environment is fully functional by @wtripp180901 in #473
  • Fix epel not using Ark repos for RL8 by @wtripp180901 in #526
  • Fix volume_backed_instances not working for compute nodes by @sjpb in #527
  • Generate and persist hostkeys for ondemand and login nodes by @wtripp180901 in #525
  • Support additional volumes on compute nodes by @sjpb in #528
  • Support SSSD and optionally LDAP by @sjpb in #438
  • Fix nightly cleanup to deal with duplicate server names by @bertiethorpe in #532
  • Fix various typos in documentation by @priteau in #530
  • Fix environment creation steps by @priteau in #531
  • Support and test "re-imageable" compute nodes via compute node metadata by @bertiethorpe in #518
  • Document required security groups by @priteau in #534
  • Bump Zenith client to latest from azimuth-cloud namespace by @m-bull in #437
  • Fix yaml formatting in operations docs by @sjpb in #535
  • Enable image builds to install extra packages by default by @sjpb in #536

Image Details

Two new images are available

  • RL8: openhpc-RL8-250114-1627-bccc88b5
  • RL9: openhpc-RL9-250114-1626-bccc88b5

New Contributors

Full Changelog: v1.156...v1.157

v1.156

07 Jan 13:20
4def5ba
Compare
Choose a tag to compare

What's Changed

Due to the size of this release, PRs are grouped below. In brief:

  • This release addresses various breakages caused by changes to upstream repos. As a result, as of this release the StackHPC images (see below) ship with all dnf repos disabled and either credentials for StackHPC's ark server or a local Pulp server mirrored from ark are required in order to build images.
  • OFED and CUDA are no longer shipped in StacHPC images and require an image build to add.
  • StackHPC images move to RockyLinux 9.5 and 8.10.
  • Added support for NVIDIA DOCA instead of OFED.
  • Added support for Lustre clients.
  • OpenHPC role supports using the same nodes in multiple partitions/groups.
  • Additional packages can be added via appliances_default_extra_packages.

Isolation from upstream dnf repos

New functionality

  • Support lustre client by @sjpb in #447
  • Install k3s cluster with ansible init by @wtripp180901 in #441
  • Make block device detection work on ESXi by @mkjpryor in #481
  • Add role to install NVIDIA DOCA on top of an existing "fat" image by @sjpb in #492
  • Fix DOCA install cleanup deleteing /tmp by @sjpb in #494
  • Add list of additional package installs by @wtripp180901 in #499
  • EXPERIMENTAL: add machinery to allow compute nodes to rejoin cluster on reimage by @sjpb in #500
  • Ansible-init compute node script by @bertiethorpe in #476

Docs

  • Add missing bits re. initial setup to refactored README by @sjpb in #464
  • Add generic upgrade docs by @sjpb in #462
  • Add note about login node reboot when changing OOD servername by @sd109 in #510

Fixes

  • Remove local DNS as a dependency for k3s by @sjpb in #442
  • Fix adhoc/rebuild wait_for_connection race condition by @bertiethorpe in #483
  • Fix Lustre deleting rdma packages and bump to v2.15.6 for RL9.5 support by @wtripp180901 in #502

Upgrades

  • Upgrade RL8 ceph to quincy + trivy rate limit and OOD false positives fix by @wtripp180901 in #477
  • Bump openhpc role for slurm restart, templating and nodes in multiple groups by @sjpb in #488

Internal CI changes/fixes

  • Don't run trivy scan on nightly builds by @sjpb in #467
  • Unset signature_verified property from nightly/latest images by @sjpb in #474
  • Don't fail cluster cleanup when prefix not found by @bertiethorpe in #480
  • Fix nightly images getting timestamp/git hash by @sjpb in #493
  • Fix nightly build version (v2) by @sjpb in #495
  • Remove use of FIPs for leafcloud packer builds by @sjpb in #498

Image Details

Two new images are available (neither of which now contain OFED) :

  • RL8: openhpc-RL8-250106-0916-f8603056
  • RL9: openhpc-RL9-250106-0916-f8603056

New Contributors

Full Changelog: v1.155...v1.156

v1.155

24 Oct 13:18
6f1554c
Compare
Choose a tag to compare

What's Changed

  • Prevent ansible-init running during packer build by @wtripp180901 in #439
  • Ensure podman copes with a hard reboot by @sjpb in #460
  • Add workflow to cleanup CI clusters by @sjpb in #451

Image Details

Three new images are available, all with OFED:

  • openhpc-RL8-241022-0441-a5affa58
  • openhpc-RL9-241022-0038-a5affa58
  • openhpc-cuda-RL9-241022-0441-a5affa58

New Contributors

Full Changelog: v1.154...v1.155

v1.154

22 Oct 10:47
ae0c067
Compare
Choose a tag to compare

What's Changed

Image details

Three new images are available, all with OFED:

  • openhpc-RL8-241009-1523-354b048a
  • openhpc-RL9-241009-1523-354b048a
  • openhpc-cuda-RL9-241009-1523-354b048a

These require a 15GB root disk except for the image with CUDA which requires 30GB.

Full Changelog: v1.153.1...v1.154

v1.153.1

01 Oct 12:46
8213ddb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.153...v1.153.1

No new images provided at this release.

v1.153

01 Oct 10:28
bb8789e
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.152...v1.153

Image details

Three new images are available:

v1.152

03 Sep 12:45
513ad1c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.151...v1.152

Image details

Two new images are available, both of which require a 15GB root disk:

v1.151

07 Aug 15:18
bd2de50
Compare
Choose a tag to compare

What's Changed

  • Add TuneD by @bertiethorpe in #409
  • Use shorter names for CI clusters by @sjpb in #415
  • Allow items in compute mapping to have different keys by @sjpb in #412
  • Move jupyter openondemand installation to fatimage by @bertiethorpe in #414
  • Support ansible-init for remote collections by @sjpb in #411
  • Avoid python-openstackclient v7 due to rebuild bug by @sjpb in #420
  • Update hpctests to obey UCX_NET_DEVICES when RoCE devices present by @bertiethorpe in #421

New Contributors

Full Changelog: v1.150...v1.151

Image details

Two new images are available, both of which require a 15GB root disk:

https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_3a06571936a0424bb40bc5c672c4ccb1/openhpc-images/

v1.150

16 Jul 09:11
c83ea12
Compare
Choose a tag to compare

What's Changed

  • Fix squid port default by @sjpb in #405
  • Allow extending fat images with site-specific groups by @sjpb in #403
  • Remove squid nodes from podman group by @sjpb in #407
  • Fix README for RL9 by @sjpb in #408
  • Add support for defining groups to basic_users by @sjpb in #406
  • Revert to base ssh repos by @sjpb in #410

Full Changelog: v1.149...v1.150

Image details

Two new images are available, both of which require a 15GB root disk:

v1.149

02 Jul 16:07
9b8ff9f
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.148...v1.149

Image Details

A new image is available for RL9 only, requiring a 15GB root disk: