-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVMLError_Unkown: Unknown Error #761
Comments
I was doing some tests with Running this code:
returns the same error
Whereas if I use
returns
Could this have something to do with this error? |
I think this was resolved in dask/distributed#5343, would it be possible to update dask/distributed to the latest version ? |
I tried updating dask and distributed to the most recent version (2020.10.0) using pip, and tried to run |
Hmm, I wonder if this is because you are using |
Just to clarify, the CUDA version that's listed in the As per #761 (comment), With that all being said, it isn't really clear to me why that error is happening, and we'll need more information. First, could you tell us what is the version of the WSL kernel? It seems you should be able to find that running |
It's also important to note that |
I was just going to use a single-GPU for now, but I was hoping to try multi-GPU clusters in the future. I'm pretty new to linux and GPU programming, so it's possible I've made a mistake somewhere, but I cant find the issue. This is what happens when I run
One thing I noticed is that if i run the command
Should I be able to run this command from within WSL/Ubuntu? if I run it from within powershell/cmd it works fine
I had a look at this stack question about a similar error - https://askubuntu.com/questions/885610/nvcc-version-command-says-nvcc-is-not-installed but couldnt find a cuda folder in However, as this stack question says - https://stackoverflow.com/questions/61122950/where-does-anaconda-install-cudatoolkit-and-cudnn?noredirect=1&lq=1 it looks like installing cudatoolkit from anaconda doesnt let you do the I also tried running the docker version, and it came up with the same error |
Thanks for confirming the kernel version, this should be fine according to CUDA's WSL2 system requirements. NVCC is the CUDA compiler, you need that if you're building RAPIDS from source, which isn't your case. When installing from conda you get pre-compiled binaries, for which NVCC isn't required. Note that WSL2 is mostly experimental and not tested/officially supported by Dask-CUDA as of now, and it does seem we still have some work to get NVML diagnostics working properly. For now, what you can do is disable it completely, everything from Dask should work, except for the GPU information in Distributed's Diagnostics. To disable it you can run your python process with:
Alternatively, you can change the corresponding |
Thanks for the reply, that seemed to fix my issue thanks! |
Thanks @pentschev for helping to resolve this. While we don't know root cause I'm going to close for now while we continue working on better WSL2 support |
@quasiben @pentschev I drilled down a bit, and traced to this: rapidsai/cudf#9955 |
Issue description
I installed RAPIDS on WSL2 using the installation guide here. cuDF is working fine but when I try to make a client using dask-cuda, I get the error NVMLError_Unknown: Unknown Error
Steps to reproduce the issue
What's the expected result?
Start a CUDA cluster client
What's the actual result?
Error:
NVMLError_Unknown: Unkown Error
Additional details / screenshot
My GPU is a NVIDIA GeForce RTX 2060
Full traceback:
One thing that I noticed was that despite installing CUDA 11.2 from here, when I ran
nvidia-smi
it says that I am using CUDA 11.6:The text was updated successfully, but these errors were encountered: