Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose device UUIDs to node label #1116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xiongzubiao
Copy link

Closes #1015

Copy link

copy-pr-bot bot commented Jan 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

internal/lm/nvml.go Outdated Show resolved Hide resolved
@xiongzubiao xiongzubiao force-pushed the uuid branch 2 times, most recently from eb8fdec to 2ad1041 Compare January 10, 2025 01:49
@elezar
Copy link
Member

elezar commented Jan 10, 2025

@xiongzubiao could you please provide information on how these labels will be used?

@xiongzubiao
Copy link
Author

xiongzubiao commented Jan 10, 2025

@xiongzubiao could you please provide information on how these labels will be used?

@elezar, we want to provide some sort of visualization to user. User can click each GPU to check its properties, status, and metrics. The device UUID is the natural choice for indexing. There are other ways to get UUID, but it is most straightforward to get it from node labels, because it is a part of node properties.

There is another use case mentioned in #1015: scheduling pod to a specific GPU using node label matching.

Signed-off-by: Zubiao Xiong <zubiao.xiong@memverge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add gpu uuids to node labels
3 participants