Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control over Learning Rate and/or initial step size in linesearch #108

Open
matillda123 opened this issue Jan 8, 2025 · 8 comments
Open

Comments

@matillda123
Copy link

Hi,

is there a convenient way to influence the step size of the available minimizers/least_square solvers? (E.g. in optax one has the learning_rate argument for all optimizers)
So far with the optimistix solvers I was only able to gain some control by changing solver.search.__dict__["step_init"], which doesnt seem right.

I guess one could just define a custom solver. But it seems weird that on one side there is so much customizability with the modular approach of optimistix, while with the predefined solvers there is very little user-control.

@johannahaffner
Copy link
Contributor

johannahaffner commented Jan 8, 2025

Hi Matilda,

you could indeed define a custom solver by re-using the searches and descents and tweaking their constants, if you desire smaller steps on average.

A learning rate would not necessarily be a meaningful concept if used with a backtracking line search or a classical trust region. They choose the step length based on optimality criteria, rather than just going into some direction regardless of whether this actually improves the objective function (by the expected amount) or not.

You could still bias these toward smaller step sizes (e.g. by requiring that BacktrackingArmijo starts with a step size of 0.5, rather than 1.0), but this would not alter their acceptance criteria.
For the trust region radius, you could change the value of the high_constant and low_constant parameters, which are used to multiply the radius depending on whether the search judges the objective to be sufficiently quadratic.

Instead of tweaking solver attributes as above, you can also modify the solvers like this:

import equinox as eqx
import optimistix as optx

solver = optx.SomeSolver(...)
search = optx.BacktrackingArmijo(..., step_init=...)  # Use your pick of constants

get = lambda s: s.search
solver = eqx.tree_at(get, solver, search)

I do this when I quickly want to try something out and don't want to bother writing down the attributes and constructor, but if I would actually use the solver many times, I'd do it properly :)

@patrick-kidger
Copy link
Owner

@matillda123 do you have an example of the kind of thing you'd like to do, using a particular solver for clarity? :)

@matillda123
Copy link
Author

I'm not sure if it's possible to give a short example here. What im trying to do is basically fitting of a high-dimensional function to some data. The function basically consists of some FFTs.
Its similar to what is described in this paper.

I'm not using one specific solver, rather I'm trying to get a feeling which solvers work nicely for this problem
I have the suspicion, that some solvers may sometimes work "significantly" better when the step size is biased towards bigger or smaller stepsizes. I noticed something like that in optax when tuning learning_rate with adam(+linesearch) and lbfgs.
So, I was looking for a way to test this with the predefined optimistix solvers and didn't find one.

I hope its clearer now what the issue is.

@johannahaffner
Copy link
Contributor

I see, tough optimisation problem!

It sounds like you want to do hyperparameter tuning - what do you get if a solver is not working well? Convergence to a local minimum, running out of steps, divergence?

And are you working with complex numbers, or have you split the problem into complex and real parts?

@matillda123
Copy link
Author

I have real inputs but I use those to make complex numbers.
If a solver doesn't work well, for me, that generally means it's stuck in a local minimum or converging really slowly.

@johannahaffner
Copy link
Contributor

johannahaffner commented Jan 9, 2025

So optimistix only ever sees real numbers?

The functions you are working with can have several local minima, right? In this case, you might want to try multi-starting your optimisation.
If you'd like step sizes larger than one wherever your optimisation landscape looks suitably quadratic, maybe try optx.BFGS with a ClassicalTrustRegion? The trust region radius can get larger than one, and has the widest range of step sizes, since these are updated multiplicatively and can increase as well as decrease. You can also take a look at the combinations here, such as

class BFGSIndirectDampedNewton(optx.AbstractBFGS):

(It sounds like you tried BacktrackingArmijo, which will start with the full Newton step in BFGS and then chop that in half until it's satisfied.)

Those are but two suggestions - I'm afraid I can't say much more with only a basic idea of what your problem is.

@matillda123
Copy link
Author

Yes, optimistix should only see real numbers.

So far I only used standard solvers (mainly BFGS, NonlinearCG, GaussNewton and Dogleg). I think Dogleg has a trustregion linesearch. But it's tricky as that is one of the cases which can get stuck in local minima.

Alright, so I guess there is no way around modifying the solvers with different searches and tuning their parameters.

Thanks for the suggestions. :)

@johannahaffner
Copy link
Contributor

Alright! Happy solving :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants