Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.compile issue when computing features on multiple GPUs (nn.DataParallel) #889

Open
GeorgeBatch opened this issue Nov 29, 2024 · 1 comment

Comments

@GeorgeBatch
Copy link
Contributor

GeorgeBatch commented Nov 29, 2024

  • TIA Toolbox version: develop branch
  • Python version: 3.11.8
  • Operating System: linux

Description

I am computing the features using multiple GPUs on the same node using DeepFeatureExtractor
My code for extracting features is pretty much the same as shown in the new notebook showing the feature extraction process: #887

What I Did

nn.DataParallel built-in within tiatoolbox handles the multi-gpu computations. I pulled the changes that introduced torch.compile and changed from ON_GPU to using device.

I updated the argument in the DeepFeatureExtractor's predict method to use device instead of on_gpu.

Errors traceback is very long to paste it all. But here are some of the errors (from the single run).

  File "/tmp/torchinductor_qun786/vv/cvvkeueuq2m4jcjzub4hcfpkhpogtc5b2xddykdgxvsxcvnpfa2w.py", line 173, in call                                               
    buf2 = extern_kernels.convolution(buf0, buf1, stride=(14, 14), padding=(0, 0), dilation=(1, 1), transposed=False, output_padding=(0, 0), groups=1, bias=Non
e)                                                                                                                                                                                                                                                                                                                
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in 
method wrapper_CUDA__cudnn_convolution)  

...

    raise exception                                                                                                                                            
RuntimeError: Caught RuntimeError in replica 0 on device 0.  

...

RuntimeError: Triton Error [CUDA]: invalid device context

What I can gather is that torch.compile is not working well with nn.DataParallel.


Please let me know if you can reproduce the error by simply running the DeepFeatureExtractor feature extraction code with rcParam["torch_compile_mode"] = "default" on a node with at least 2 devices.

Maybe nn.DistributedDataParallel is a better option to use: https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead

def model_to(model: torch.nn.Module, device: str = "cpu") -> torch.nn.Module:
"""Transfers model to specified device e.g., "cpu" or "cuda".
Args:
model (torch.nn.Module):
PyTorch defined model.
device (str):
Transfers model to the specified device. Default is "cpu".
Returns:
torch.nn.Module:
The model after being moved to specified device.
"""
if device != "cpu":
# DataParallel work only for cuda
model = torch.nn.DataParallel(model)
device = torch.device(device)
return model.to(device)

@shaneahmed
Copy link
Member

Thanks @GeorgeBatch We will investigate this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants