-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to send command to MPS daemon #762
Comments
I haven't read your issue in detail, but maybe this will help: |
Thanks, so I did read this doc before posting the issue. The problem is that this never happens. |
So I don't know why, but if I reboot the offending machines after enabling MPS via the config map then the mps control daemon pods startup. It'd be good to get to the bottom of why this is, as it took me hours to figure this out plus others might be having the same problem. Any ideas on what I can look at? |
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. |
@RonanQuigley I'm encountering the same issue that began with k8s-device-plugin v0.15, which continues in v0.17 based on my testing today.
|
1. Quick Debug Information
2. Issue or feature description
I'm struggling to understand how to enable MPS with the provided README . I'm using helm chart version 0.15.0. I'm using the nvidia device plugin helm chart. I'm not using the gpu-operator chart.
Am I supposed to do something after enabling mps via the config map? I've also tried going onto the relevant gpu worker node and enabling mps via
nvidia-cuda-mps-control -d
but that made no difference.Logs from the
nvidia-device-plugin-ctr
container in thenvidia-device-plugin
pod:Additional information that might help better understand your environment and reproduce the bug:
The text was updated successfully, but these errors were encountered: