Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: [!] NaN loss with loss. #11

Open
mbarnig opened this issue Jan 11, 2022 · 2 comments
Open

RuntimeError: [!] NaN loss with loss. #11

mbarnig opened this issue Jan 11, 2022 · 2 comments
Assignees

Comments

@mbarnig
Copy link
Owner

mbarnig commented Jan 11, 2022

During the Transfer-Learning of the Marylux-648 dataset with the Glow-TTS -LJSpeech model, a runtime error NaN (not a number) is issued at epoch 145.

Traceback (most recent call last):
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 1007, in fit
    self._fit()
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 992, in _fit
    self.train_epoch()
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 820, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 690, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 601, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/trainer.py", line 560, in _model_train_step
    return model.train_step(*input_args)
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/tts/models/glow_tts.py", line 381, in train_step
    loss_dict = criterion(
  File "/home/mbarnig/COQUI_TTS_0.5.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/tts/layers/losses.py", line 437, in forward
    raise RuntimeError(f" [!] NaN loss with {key}.")
RuntimeError:  [!] NaN loss with loss.

The training continued by restoring the latest checkpoint, but 16 epochs later the same error appeared.

@mbarnig mbarnig self-assigned this Jan 11, 2022
@mbarnig
Copy link
Owner Author

mbarnig commented Jan 22, 2022

When training with lb-multispeaker dataset,

File "/home/mbarnig/COQUI_TTS_0.5.0/TTS/TTS/tts/layers/losses.py", line 437, in forward
raise RuntimeError(f" [!] NaN loss with {key}.")
RuntimeError:  [!] NaN loss with loss.

the same error appears in epoch 2.

@mbarnig
Copy link
Owner Author

mbarnig commented Jan 22, 2022

see issues 282 in Mozilla-TTS and Matplotlib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant