Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS control daemon wrong pod selector #982

Open
radepajic opened this issue Oct 8, 2024 · 1 comment
Open

MPS control daemon wrong pod selector #982

radepajic opened this issue Oct 8, 2024 · 1 comment

Comments

@radepajic
Copy link

I’ve installed the latest version of the nvidia-device-plugin (0.16.2) using Helm. Alongside the device plugin, the MPS control daemon is also being installed. The problem is that in the MPS control daemon set, the pod selector is the same as in the device plugin daemon set. As a result, the device plugin pod starts first, and both controllers attempt to manage the same pod, preventing MPS from ever starting.

@chipzoller
Copy link
Contributor

Does indeed look like an issue. This isn't apparent when deploying the device plugin with the GPU operator because the templated labels applied to the device plugin and MPS control DaemonSets (via the app key) are equal to the respective names of those DaemonSets which naturally differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants