-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approximation of lipschitz constant for use in step size #7
Comments
In the call to I think the SGD works by conceiving of the loss as Put another way,
|
Ah, yes, that makes sense. Thanks! What do you think about using an adaptive step size as the lightning package optionally does, @tdhock and @michaelweylandt ? |
Potentially interesting and might buy some speed, but I think there's probably more low-hanging fruit performance-wise elsewhere. The paper they reference describes the step-size search in a fully smooth setting (for a slightly different algorithm), so I'd want to check the math before implementing, but it seems like it would work. |
Currently, we are approximating the Lipschitz constant as in scikit-learn by using the maximum absolute rowise squared norm, whilst the theoretically best choice, if I am not mistaken, is to take the largest eigenvalue of the hessian. The reason for this behavior, as I understand it, is that the decomposition is computationally demanding but it does seem suboptimal, particularly since we are using a constant step size.
Here is a short demonstration (in R) of the different methods for least squares:
I have planned to run proper tests on this to figure out a best solution but I though I would just put it up here to remember to revisit it later.
Also, I wanted to check that I am not missing something. The difference seems to be huge?
The text was updated successfully, but these errors were encountered: