Skip to content

Commit

Permalink
Polishing.
Browse files Browse the repository at this point in the history
  • Loading branch information
janosg committed Nov 1, 2024
1 parent e0a7cd4 commit 4046a88
Showing 1 changed file with 84 additions and 18 deletions.
102 changes: 84 additions & 18 deletions docs/source/how_to/how_to_algorithm_selection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,22 @@
"\n",
"- There is no optimizer that works well for all problems \n",
"- Making the right choice can lead to enormous speedups\n",
"- Making the wrong choice can mean that you cannot solve your problem at all\n",
"- Making the wrong choice can mean that you don't solve your problem at all; Sometimes, \n",
"optimizers fail silently!\n",
"\n",
"\n",
"## The four steps for selecting algorithms\n",
"\n",
"Algorithm selection is a mix of theory and experimentation. We recommend the following \n",
"for steps:\n",
"\n",
"1. Theory: Select three to 5 candidate algorithms based on the properties \n",
"1. **Theory**: Select three to 5 candidate algorithms based on the properties \n",
"of your problem. Below we provide a simple decision tree for this step.\n",
"2. Experiments: Run the candidate algorithms fo a small number of function \n",
"2. **Experiments**: Run the candidate algorithms for a small number of function \n",
"evaluations. As a rule of thumb, use between `n_params` and `10 * n_params`\n",
"evaluations. \n",
"3. Comparison: Compare the results in a criterion plot.\n",
"4. Optimization: Re-run the optimization algorithm with the best results until \n",
"3. **Comparison**: Compare the results in a *criterion plot*.\n",
"4. **Optimization**: Re-run the algorithm with the best results until \n",
"convergence. Use the best parameter vector from the experiments as starting point.\n",
"\n",
"These steps work well for most problems. Sometimes you need [variations](four-steps-variations).\n",
Expand All @@ -43,6 +44,17 @@
"A Python implementation of the function and its gradient looks like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -80,18 +92,18 @@
"graph LR\n",
" classDef highlight fill:#FF4500;\n",
" A[\"Do you have<br/>nonlinear constraints?\"] -- yes --> B[\"differentiable?\"]\n",
" B[\"differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr'\"]\n",
" B[\"differentiable?\"] -- no --> D[\"'scipy_cobyla', 'nlopt_cobyla'\"]\n",
" B[\"differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr', ...\"]\n",
" B[\"differentiable?\"] -- no --> D[\"'scipy_cobyla', 'nlopt_cobyla', ...\"]\n",
"\n",
" A[\"Do you have<br/>nonlinear constraints?\"] -- no --> E[\"Can you exploit<br/>a least-squares<br/>structure?\"]\n",
" E[\"Can you exploit<br/>a least-squares<br/>structure?\"] -- yes --> F[\"differentiable?\"]\n",
" E[\"Can you exploit<br/>a least-squares<br/>structure?\"] -- no --> G[\"differentiable?\"]\n",
"\n",
" F[\"differentiable?\"] -- yes --> H[\"'scipy_ls_lm', 'scipy_ls_trf', 'scipy_ls_dogleg'\"]\n",
" F[\"differentiable?\"] -- no --> I[\"'nag_dflos', 'pounders', 'tao_pounders'\"]\n",
" F[\"differentiable?\"] -- yes --> H[\"'scipy_ls_lm', 'scipy_ls_trf', 'scipy_ls_dogleg', ...\"]\n",
" F[\"differentiable?\"] -- no --> I[\"'nag_dflos', 'pounders', 'tao_pounders', ...\"]\n",
"\n",
" G[\"differentiable?\"] -- yes --> J[\"'scipy_lbfgsb', 'nlopt_lbfgsb', 'fides'\"]\n",
" G[\"differentiable?\"] -- no --> K[\"'nlopt_bobyqa', 'nlopt_neldermead', 'neldermead_parallel'\"]\n",
" G[\"differentiable?\"] -- yes --> J[\"'scipy_lbfgsb', 'nlopt_lbfgsb', 'fides', ...\"]\n",
" G[\"differentiable?\"] -- no --> K[\"'nlopt_bobyqa', 'nlopt_neldermead', 'neldermead_parallel', ...\"]\n",
"```\n",
"\n",
"Let's go through the steps for the Trid function:\n",
Expand All @@ -107,10 +119,10 @@
"\n",
"## Step 2: Experiments\n",
"\n",
"Below, we simply run optimizations with all algorithms in a loop and store the result \n",
"in a dictionary. We limit the number of function evaluations to 8. Since some algorithms \n",
"only support a maximum number of iterations as stopping criterion we also limit the \n",
"number of iterations to 8."
"To find out which algorithms work well for our problem, we simply run optimizations with\n",
"all algorithms in a loop and store the result in a dictionary. We limit the number of \n",
"function evaluations to 8. Since some algorithms only support a maximum number of iterations \n",
"as stopping criterion we also limit the number of iterations to 8.\n"
]
},
{
Expand Down Expand Up @@ -155,7 +167,8 @@
"source": [
"All optimizers work pretty well here and since this is a very simple problem, any of them \n",
"would probably find the optimum in a reasonable time. However, `nlopt_lbfgsb` is a bit \n",
"better than the others, so we will select it for the next step. \n",
"better than the others, so we will select it for the next step. In more difficult\n",
"examples, the difference between optimizers can be much more pronounced.\n",
"\n",
"## Step 4: Optimization \n",
"\n",
Expand Down Expand Up @@ -203,7 +216,7 @@
"source": [
"(four-steps-variations)=\n",
"\n",
"## Variations\n",
"## Variations of the four steps\n",
"\n",
"The four steps described above work very well in most situations. However, sometimes \n",
"it makes sense to deviate: \n",
Expand All @@ -213,7 +226,60 @@
"- If it is very important to find a precise optimum, run more than 1 algorithm until \n",
"convergence. \n",
"- If you have a very fast objective function, simply run all candidate algorithms until \n",
"convergence. "
"convergence. \n",
"- If you have a differentiable objective function but no closed form derivative, use \n",
"at least one gradient based optimizer and one gradient free optimizer in the \n",
"experiments. See [here](how_to_derivatives.ipynb) to learn more about derivatives.\n",
"\n",
"\n",
"## How important was it?\n",
"\n",
"The Trid function is differentiable and very well behaved in almost every aspect. \n",
"Moreover, it has a very short runtime. One would think that any optimizer can find its \n",
"optimum. So let's compare the selected optimizer with a few others:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results = {}\n",
"for algo in [\"nlopt_lbfgsb\", \"scipy_neldermead\", \"scipy_cobyla\"]:\n",
" results[algo] = om.minimize(\n",
" fun=trid_scalar,\n",
" jac=trid_gradient,\n",
" params=np.arange(20),\n",
" algorithm=algo,\n",
" )\n",
"\n",
"fig = om.criterion_plot(results)\n",
"fig.show(renderer=\"png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that our chosen optimizer solves the problem with less than 35 function \n",
"evaluations. At this time, the two gradient free optimizers have not even started to \n",
"make significant progress. Cobyla gets reasonably close to an optimum after about 4k \n",
"evaluations. Neldermead gets stuck after 8k evaluations and fails to solve the problem. \n",
"\n",
"This example shows not only that the choice of optimizer is important but that the commonly \n",
"held belief that gradient free optimizers are generally more robust than gradient based \n",
"ones is dangerous! The Neldermead algorithm did \"converge\" and reports success, but\n",
"did not find the optimum. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results[\"scipy_neldermead\"].success"
]
}
],
Expand Down

0 comments on commit 4046a88

Please sign in to comment.