Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reward turns to nan #2

Open
iriscxy opened this issue Mar 22, 2018 · 8 comments
Open

reward turns to nan #2

iriscxy opened this issue Mar 22, 2018 · 8 comments

Comments

@iriscxy
Copy link

iriscxy commented Mar 22, 2018

As training moves on, the reward and loss all become 'nan'. Has this problem existed in your data?
A -> B
('[s]', 'Old power means the fossil ##AT##-##AT## nuclear energies : oil , natural gas , coal and uranium exploited in centralised , monopolistic energy systems supported by short ##AT##-##AT## term thinking politics .')
('[smid]', ' Interaktion Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Schecks')
r1= nan r2= nan rk= nan fw_loss= nan bw_loss= nan
A loss = nan B loss = nan

@JCly-rikiu
Copy link
Collaborator

Do the reward and loss become 'nan' all the time? At which step?
The 2 pre-trained NMT models impact the result a lot.
How about pre-training more or trying other data, and see what will happen?

@wky9710
Copy link

wky9710 commented Apr 9, 2018

I met the same problem...
I tried to pre-train on 20% data and for 100 epoches, and tried on a better dataset but still failed,
My tutor told me to change the learning rate, from 1e-3 to 1e-5, and it worked well longer than before, but still failed after about 200 steps, and now 1e-6 is running...
So does a lower learning rate really useful? It seems 1e-3 will fail very quickly(about 50 steps), and lower the lr will make it later.
And what does nan means? The language model fails? The NMT fails?
Thanks for your patience :)

@yangkexin
Copy link

I finally found that I met the same problem as you,when it trys to generate words in beam(), the new_hyp_scores turn to nan at about 1000 steps, then I changed the learning rate, from 1e-3 to 1e-5 as suggested above,it worked well longer than before, I think the result shows that the nmt model must be train more,and next step I want to change the optimizer,such as adam. If you find some useful methods, please tell me how to do it.Thank you :)

@JCly-rikiu
Copy link
Collaborator

We have tired adam before, but the result was bad.
We think maybe the reason is that adam changes learning rate constantly. And the loss of translation is not smooth, that makes training process out of control, so the loss can't decrease.

@wky9710
Copy link

wky9710 commented May 5, 2018

After several steps(about 20), with learning rate of 1e-6(maybe small enough... 1e-3 is also tried, and loss turned to nan after 2 steps, even before saving a model...), the loss turns to nan again...
I've tried to retrain the nmt model, for about 100w iters, with bleu of about 33.7 for modelA and 15.5 for modelB, but it just won't work...
Does it means that whether the method works heavily depends on the data? Or the nmt model?

@JCly-rikiu
Copy link
Collaborator

  1. Yes, this method depends heavily on the data. We have read a review mentioned about that.
  2. We think the problem of nan loss is probably due to the gradient exploding, but we didn't have the nan loss anymore after we changed the optimizer to SGD.

@oceanypt
Copy link

oceanypt commented Jan 8, 2019

After several steps(about 20), with learning rate of 1e-6(maybe small enough... 1e-3 is also tried, and loss turned to nan after 2 steps, even before saving a model...), the loss turns to nan again...
I've tried to retrain the nmt model, for about 100w iters, with bleu of about 33.7 for modelA and 15.5 for modelB, but it just won't work...
Does it means that whether the method works heavily depends on the data? Or the nmt model?

I think the Nan problem comes from the reward calculation, because the reward is divided by std but std can be zero, so changing the reward form may solve the problem.

@fuzihaofzh
Copy link

I also meet with this problem. Has anyone found any method to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants