Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AMD MI300X GPU #368

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add support for AMD MI300X GPU #368

wants to merge 5 commits into from

Conversation

samos123
Copy link
Contributor

@samos123 samos123 commented Jan 19, 2025

  • Add an image for amd-gpu
  • Add resourceProfile for AMD MI300X
  • Add a default values file for AMD GPU operator
  • Add installation guide section for using AMD GPU operator

Benchmark of Llama 3.1 70B on 1 x AMD MI300X: https://substratus.ai/blog/benchmarking-llama-3.1-70b-amd-mi300x

@samos123 samos123 requested a review from nstogner January 19, 2025 03:50
## Installation using AMD GPUs

This section assumes you have a Kubernetes cluster with AMD GPU resources available and
installed the NVIDIA device plugin that adds GPU information labels to the nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be AMD

This time we need to use a custom resource profiles that define the nodeSelectors
for different GPU types.

Download the values file for the NVIDIA GPU operator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD GPU Operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants