Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Amazon Linux 2023 (AL2023) #282

Closed
bryantbiggs opened this issue Oct 3, 2023 · 5 comments · Fixed by #765
Closed

Support Amazon Linux 2023 (AL2023) #282

bryantbiggs opened this issue Oct 3, 2023 · 5 comments · Fixed by #765

Comments

@bryantbiggs
Copy link
Member

If this is already a supported OS, please feel free to close

Ref: https://github.com/aws/aws-ofi-nccl#requirements

@rauteric
Copy link
Contributor

I'm not aware of any reason the plugin wouldn't work with AL2023. But testing this seems to be contingent on amazonlinux/amazon-linux-2023#12.

aws-nslick added a commit to aws-nslick/nccl-net-ofi that referenced this issue May 21, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).
aws-nslick added a commit to aws-nslick/nccl-net-ofi that referenced this issue May 21, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).
aws-nslick added a commit to aws-nslick/nccl-net-ofi that referenced this issue May 21, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).
aws-nslick added a commit that referenced this issue May 22, 2024
Add an al2023 build (#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).
AmedeoSapio pushed a commit to AmedeoSapio/aws-ofi-nccl that referenced this issue May 29, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).

(cherry picked from commit 309d834)
AmedeoSapio pushed a commit to AmedeoSapio/aws-ofi-nccl that referenced this issue May 29, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).

(cherry picked from commit 309d834)
AmedeoSapio pushed a commit to AmedeoSapio/aws-ofi-nccl that referenced this issue May 29, 2024
Add an al2023 build (aws#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).

(cherry picked from commit 309d834)
AmedeoSapio pushed a commit that referenced this issue May 29, 2024
Add an al2023 build (#282), alongside the recently added al2 build. This
also runs on codebuild "self-hosted" runners rather than on the gh
hosted runners. To avoid installing the full toolkit, disable tests in
that build (see related commit b9c46d2).

(cherry picked from commit 309d834)
@stewartsmith
Copy link

@aws-nslick
Copy link
Contributor

@stewartsmith can you please review #542 and #526

@aws-nslick
Copy link
Contributor

To some extent this is being handled. We're not planning on supporting it through any mechanism other than EFA installer, though. Builds should work as soon as #592 lands.

@HarryCaveMan
Copy link

HarryCaveMan commented Dec 3, 2024

As noted in amazonlinux/amazon-linux-2023#12 (comment)

NVIDIA now has documentation on using their drivers on AL2023. See https://docs.nvidia.com/cuda/pdf/CUDA_Quick_Start_Guide.pdf and https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#amazon-linux-2023

Their amzn2023 rpm repo does not include nccl or cudnn. It's simple enough to install cudnn with just curl/tar, but downloading nccl tarball requires authentication with nvidia which makes a programmatic install less fun.

rajachan added a commit to rajachan/aws-ofi-nccl that referenced this issue Jan 15, 2025
We included this in the release notes started v1.13.0, but forgot to
update the README.

Fixes: aws#282

Signed-off-by: Raghu Raja <raghunch@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants