Replies: 1 comment
-
If the very first gradient evaluation in forward-mode hits a near-zero denominator, you get a NaN right away. Can you use a small safe divisor offset (ϵ) or clamping? Since extremely small value can cause this issue |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Under what circumstances is it possible to have a forward mode nan gradient but not a reverse mode nan gradient? The context for this issue is here.
Beta Was this translation helpful? Give feedback.
All reactions