Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bphilip/add mofed and dependencies #11479

Merged
merged 30 commits into from
Jan 17, 2025
Merged

Conversation

binujp
Copy link
Contributor

@binujp binujp commented Dec 17, 2024

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./LICENSES-AND-NOTICES/SPECS/data/licenses.json, ./LICENSES-AND-NOTICES/SPECS/LICENSES-MAP.md, ./LICENSES-AND-NOTICES/SPECS/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

What does the PR accomplish, why was it needed?

This PR brings in MOFED and dependent packages required to build our AI/ML distribution story. Immediately HPC images require infiniband for GPU-clustering. MOFED is thus a core requirement. There are first and third party requirments for MOFED drivers directly and dependent packages.

CUDA and GdrCopy drivers need MOFED as a build-time and runtime requirements. This is not satisfied today and CUDA driver is built without MOFED support. These packages getting integrated into core will enable that work stream to be unblocked as well.

Change Log

There are 18 new packages being added by this change to the SPECS directory. There are 8 new SIGNED-SPECS being added too. They are either directly required to build out MOFED and support infrastructure or are tools required to provide a complete infiniband solution like tools to configure the hw/driver and debug issues.

fwctl
ibarr
ibsim
iser
isert
knem
mft_kernel
mlnx-ethtool
mlnx-iproute2
mlnx-nfsrdma
mlnx-ofa_kernel
mlx-steering-dump
multiperf rshim
sockperf
srp
xpmem-lib
xpmem

Does this affect the toolchain?

No

YES/NO

Associated issues
  • #xxxx
Links to CVEs
Test Methodology

The RPMs from a local build of these specs were installed on A100, V100 and T4 GPU based VMs and functionality verified.

Binu Jose Philip added 2 commits December 15, 2024 20:31
Infiniband is a requirement for AI/ML deployments for inter-GPU communication and scale out GPU clusters. HPC team built HPC azure linux image needs infiniband mofed drivers. Their major customer singularity is on a deadline to move from Ubuntu to azure linux for AI/ML workloads. All of the required sources are open source and have spec's which are already being used by NVIDIA. This PR brings to Azure Linux MOFED driver for infiniband driver and all dependencies to help use, manage and debug the stack. These modules have already been built, integrated into an HPC image and tested on an existing 16 mode cluster owned by HPC team. Performance characteristics are within specified tolerance limits.
@microsoft-github-policy-service microsoft-github-policy-service bot added Tools 3.0-dev PRs Destined for AzureLinux 3.0 labels Dec 30, 2024
@binujp binujp marked this pull request as ready for review January 13, 2025 18:50
@binujp binujp requested a review from a team as a code owner January 13, 2025 18:50
SPECS/ibsim/ibsim.spec Outdated Show resolved Hide resolved
@binujp binujp requested a review from romoh January 15, 2025 20:07
Comment on lines 33 to 35
%{!?_name: %define _name fwctl}
%{!?_version: %define _version 24.10}
%{!?_release: %define _release OFED.24.10.0.6.7.1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define these _version separately? Won't the %{version} work? Similar question for _name. And what's the logic behind _release? I see it trickles down to _kmp_rel for instance but I don't see _kmp_rel used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is defined from MOFED SPECS for alignment. Hence keeping it same for furture updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -17,6 +17,7 @@
"Fedora (ISC)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from Fedora \d+ \(license: ISC\)(\.|\n|$)',
"Magnus Edenhill Open Source": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from Magnus Edenhill Open Source \(license: BSD\)(\.|\n|$)',
"NVIDIA": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(license: (ASL 2\.0|GPLv2)\)(\.|\n|$)',
"NVIDIA (BSD)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(BSD\) \(license: (BSD)\)(\.|\n|$)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for "CBL-Mariner"` in the new source attributions. The other ones are to support the old specs. And there's no need to repeat "BSD", I think.

Suggested change
"NVIDIA (BSD)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(BSD\) \(license: (BSD)\)(\.|\n|$)',
"NVIDIA (BSD)": r'\n-\s+Initial Azure Linux import from NVIDIA \(license: (BSD)\)(\.|\n|$)',

A bigger question, perhaps: why add a new entry specially for BSD? I see the older one for Nvidia already covers two licenses. Is there a reason we don't want to just update that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the Fedora entries and tried to keep different licenses separate. If there is no problem in ASL/GPL/BSD going to the same source we can make that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added BSD to default Nvidia entry.

Comment on lines 92 to 100
BuildRequires: gcc
BuildRequires: make
BuildRequires: kernel-devel = %{target_kernel_version_full}
BuildRequires: kernel-headers = %{target_kernel_version_full}
BuildRequires: binutils
BuildRequires: systemd
BuildRequires: kmod
BuildRequires: mlnx-ofa_kernel-devel = %{_version}
BuildRequires: mlnx-ofa_kernel-source = %{_version}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for all signed specs: they don't need most BuildRequires, since they only expand already built RPMs and replace a few files. The packages providing rpm2cpio and cpio are part of the default build chroot but to be on the safe side, I'd add them to BuildRequires.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Requires: kmod

%description
fwctl signed kernel modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the description, that the published package will have, so I'd suggest something, that gives users more information about the package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is copied from unsigned SPECS. Both unsigned and signed SPECS have same package description.

Comment on lines 70 to 75
# This package's "version" and "release" must reflect the unsigned version that
# was signed.
# An important consequence is that when making a change to this package, the
# unsigned version/release must be increased to keep the two versions consistent.
# Ideally though, this spec will not change much or at all, so the version will
# just track the unsigned package's version/release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the comment - the specs entanglement PR check will handle that for you. Just make sure to update the set of spec pairs, which need to be kept in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

SPECS/knem/knem.spec Outdated Show resolved Hide resolved
SPECS/xpmem/xpmem.spec Outdated Show resolved Hide resolved
@binujp binujp merged commit d7bc078 into 3.0-dev Jan 17, 2025
13 checks passed
@binujp binujp deleted the bphilip/add-mofed-and-dependencies branch January 17, 2025 01:12
@binujp binujp mentioned this pull request Jan 18, 2025
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0-dev PRs Destined for AzureLinux 3.0 Packaging Tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants