Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Inference stage with command line on Ubuntu22.04, RuntimeError: CUDA error fixed! #20

Open
j-yi-11 opened this issue Nov 3, 2024 · 1 comment

Comments

@j-yi-11
Copy link

j-yi-11 commented Nov 3, 2024

My Modification for Command Line Usage:

Smooth Diffusion is a Great job! I try to run with command line to reproduce the results. To acheive this, I mainly removed the part of gradio and set argparse in original app.py, with detail in file app.txt

Bug output

When I run with python app.py --mode interpolation --img0 './assets/images/interpolation/cityview1.png' --img1 './assets/images/interpolation/cityview2.png' --txt0 'A city view' --txt1 'Another city view' on Ubuntu22.04 and find something wrong as follows:

│    596 │   │   else:                                                                             │
│    597 │   │   │   print("self.tag_diffuser = ", self.tag_diffuser)                              │
│    598 │   │   │   print("self.tag_lora = ", self.tag_lora)                                      │
│ ❱  599 │   │   │   data0, data1 = self.nullinvdual_or_loadcachedual(                             │
│    600 │   │   │   │   img0, img1, {'txt0':txt0, 'txt1':txt1, 'step':step,                       │
│    601 │   │   │   │   │   │   │    'cfg_scale':cfg_scale, 'inner_step':inner_step,              │
│    602 │   │   │   │   │   │   │    'diffuser' : self.tag_diffuser, 'lora' : self.tag_lora,}, f  │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/app.py:400 in nullinvdual_or_loadcachedual                  │
│                                                                                                  │
│    397 │   │   │   │   emb0 = txt_to_emb(self.net, txt0)                                         │
│    398 │   │   │   │   emb1 = txt_to_emb(self.net, txt1)                                         │
│    399 │   │   │                                                                                 │
│ ❱  400 │   │   │   xt0, xt1, nemb = null_inversion_model.null_invert_dual(                       │
│    401 │   │   │   │   img0, img1, txt0, txt1, num_inner_steps=inner_step)                       │
│    402 │   │   │   cache_data = {                                                                │
│    403 │   │   │   │   'step' : step, 'cfg_scale' : cfg_scale,                                   │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/nulltxtinv_wrapper.py:460 in null_invert_dual               │
│                                                                                                  │
│   457 │   │   nemb = nemb.to(device)                                                             │
│   458 │   │                                                                                      │
│   459 │   │   # nulltext inversion                                                               │
│ ❱ 460 │   │   nembs = self.null_optimization_dual(                                               │
│   461 │   │   │   ddim_latents_0, ddim_latents_1, emb0, emb1, nemb, num_inner_steps, early_sto   │
│   462 │   │                                                                                      │
│   463 │   │   self.model.scheduler = scheduler_save                                              │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/nulltxtinv_wrapper.py:407 in null_optimization_dual         │
│                                                                                                  │
│   404 │   │   │   │   │      nnf.mse_loss(latents_prev_rec1, latent_prev1)                       │
│   405 │   │   │   │                                                                              │
│   406 │   │   │   │   optimizer.zero_grad()                                                      │
│ ❱ 407 │   │   │   │   loss.backward()                                                            │
│   408 │   │   │   │   optimizer.step()                                                           │
│   409 │   │   │   │   loss_item = loss.item()                                                    │
│   410 │   │   │   │   bar.update()                                                               │
│                                                                                                  │
│ /home/oppo2/anaconda3/envs/smooth-diffusion/lib/python3.9/site-packages/torch/_tensor.py:487 in  │
│ backward                                                                                         │
│                                                                                                  │
│    484 │   │   │   │   create_graph=create_graph,                                                │
│    485 │   │   │   │   inputs=inputs,                                                            │
│    486 │   │   │   )                                                                             │
│ ❱  487 │   │   torch.autograd.backward(                                                          │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs                     │
│    489 │   │   )                                                                                 │
│    490                                                                                           │
│                                                                                                  │
│ /home/oppo2/anaconda3/envs/smooth-diffusion/lib/python3.9/site-packages/torch/autograd/__init__. │
│ py:200 in backward                                                                               │
│                                                                                                  │
│   197 │   # The reason we repeat same the comment below is that                                  │
│   198 │   # some Python versions print out the first line of a multi-line function               │
│   199 │   # calls in the traceback and some print out the last line                              │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the bac   │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,                        │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to ru   │
│   203                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid argument
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

My Solution

In nulltxtinv_wrapper.py, I added:

                latent_prev = latent_prev.detach().clone()
                latent_prev.requires_grad = True
                latents_prev_rec = latents_prev_rec.detach().clone()
                latents_prev_rec.requires_grad = True

in function null_optimization, I added:

                latents_prev_rec0 = latents_prev_rec0.detach().clone()
                latents_prev_rec0.requires_grad = True
                latents_prev_rec1 = latents_prev_rec1.detach().clone()
                latents_prev_rec1.requires_grad = True
                latent_prev0 = latent_prev0.detach().clone()
                latent_prev0.requires_grad = True
                latent_prev1 = latent_prev1.detach().clone()
                latent_prev1.requires_grad = True

in function null_optimization_dual, which are detailed in file nulltxtinv_wrapper.txt
These modifications makes running with cmd successful.

@tsachiblau
Copy link

tsachiblau commented Dec 11, 2024

The computation graph currently looks like this:

uncond_embeddings - >get_noise_pred_single->noise_pred_uncond0->noise_pred0->prev_step->latents_prev_rec0->loss

By detaching latents_prev_rec0, it seems that the optimization loses connection to uncond_embeddings. This would make the optimization ineffective, wouldn’t it? In addition, the loss also does not change much.

I believe the issue lies with the package versions. Here are the versions I am installing now:

conda create --name smooth-diffusion python=3.10.12 conda activate smooth-diffusion conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia pip install -r requirements.txt

The requirements.txt contains the following:
accelerate==1.1.1 datasets==2.14.4 diffusers==0.31.0 easydict==1.13 gradio==4.19.2 huggingface-hub==0.26.3 moviepy==1.0.3 opencv_python==4.7.0.72 packaging==24.2 pypatchify==0.1.4 safetensors==0.4.5 tqdm==4.65.0 transformers==4.46.3 wandb==0.16.3 peft==0.14.0

This changes works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants