Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why mwer use stop gradient? #2

Open
Mddct opened this issue Nov 15, 2021 · 3 comments
Open

why mwer use stop gradient? #2

Mddct opened this issue Nov 15, 2021 · 3 comments

Comments

@Mddct
Copy link

Mddct commented Nov 15, 2021

why mwer use stop gradient? just a regularization?

@Mddct
Copy link
Author

Mddct commented Nov 15, 2021

why mwer use stop gradient? just a regularization?

May be Variance reduction

@leixiaoning
Copy link

i find tf ctc beam search will loss the gradients

@TeaPoly
Copy link
Owner

TeaPoly commented Dec 9, 2022

i find tf ctc beam search will loss the gradients

Beam search is just to find candidate paths, gradient is not required in beam search. Gradients are pushed back to logit weight since there are probability P which is computed from logit ​​as input to MWER loss. NBEST path from CTC Beam search can actually be generated offline to speed up training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants