From 7cb889d0ec5fd3f2930ede91b2bdf25a4125fd76 Mon Sep 17 00:00:00 2001 From: Janos Gabler Date: Mon, 4 Nov 2024 09:59:10 +0100 Subject: [PATCH] Apply suggestions from code review by HM Co-authored-by: Hans-Martin von Gaudecker --- .../how_to/how_to_algorithm_selection.ipynb | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/source/how_to/how_to_algorithm_selection.ipynb b/docs/source/how_to/how_to_algorithm_selection.ipynb index 2f9bbb66e..d6a04ee54 100644 --- a/docs/source/how_to/how_to_algorithm_selection.ipynb +++ b/docs/source/how_to/how_to_algorithm_selection.ipynb @@ -15,7 +15,7 @@ "\n", "- There is no optimizer that works well for all problems \n", "- Making the right choice can lead to enormous speedups\n", - "- Making the wrong choice can mean that you don't solve your problem at all; Sometimes, \n", + "- Making the wrong choice can mean that you don't solve your problem at all. Sometimes, \n", "optimizers fail silently!\n", "\n", "\n", @@ -24,20 +24,20 @@ "Algorithm selection is a mix of theory and experimentation. We recommend the following \n", "for steps:\n", "\n", - "1. **Theory**: Select three to 5 candidate algorithms based on the properties \n", - "of your problem. Below we provide a simple decision tree for this step.\n", + "1. **Theory**: Based on the properties of your problem, start with 3 to 5 candidate algorithms. \n", + "You may use the [decision tree below](link)\n", "2. **Experiments**: Run the candidate algorithms for a small number of function \n", "evaluations. As a rule of thumb, use between `n_params` and `10 * n_params`\n", "evaluations. \n", "3. **Comparison**: Compare the results in a *criterion plot*.\n", "4. **Optimization**: Re-run the algorithm with the best results until \n", - "convergence. Use the best parameter vector from the experiments as starting point.\n", + "convergence. Use the best parameter vector from the experiments as start parameters.\n", "\n", "These steps work well for most problems. Sometimes you need [variations](four-steps-variations).\n", "\n", "## An example problem\n", "\n", - "As an example we use the Trid function. The Trid function has no local minimum except \n", + "As an example we use the [Trid function](https://www.sfu.ca/~ssurjano/trid.html). The Trid function has no local minimum except \n", "the global one. It is defined for any number of dimensions, we will pick 20. As starting \n", "values we will pick the vector [0, 1, ..., 19]. \n", "\n", @@ -86,13 +86,13 @@ "source": [ "## Step 1: Theory\n", "\n", - "The below decision tree offers a practical guide on how to narrow down the set of algorithms to experiment with, based on the theoretical properties of your problem:\n", + "This is a practical guide for narrowing down the set of algorithms to experiment with:\n", "\n", "```{mermaid}\n", "graph LR\n", " classDef highlight fill:#FF4500;\n", " A[\"Do you have
nonlinear constraints?\"] -- yes --> B[\"differentiable?\"]\n", - " B[\"differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr', ...\"]\n", + " B[\"Is your objective function differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr', ...\"]\n", " B[\"differentiable?\"] -- no --> D[\"'scipy_cobyla', 'nlopt_cobyla', ...\"]\n", "\n", " A[\"Do you have
nonlinear constraints?\"] -- no --> E[\"Can you exploit
a least-squares
structure?\"]\n", @@ -108,9 +108,9 @@ "\n", "Let's go through the steps for the Trid function:\n", "\n", - "1. There are no nonlinear constraints our solution needs to satisfy\n", - "2. There is no least-squares structure we can exploit \n", - "3. The function is differentiable and we have a closed form gradient that we would like \n", + "1. **No** nonlinear constraints our solution needs to satisfy\n", + "2. **No** no least-squares structure we can exploit \n", + "3. **Yes**, the function is differentiable and we have a closed form gradient that we would like \n", "to use. \n", "\n", "We therefore end up with the candidate algorithms `scipy_lbfgsb`, `nlopt_lbfgsb`, and \n", @@ -263,13 +263,13 @@ "metadata": {}, "source": [ "We can see that our chosen optimizer solves the problem with less than 35 function \n", - "evaluations. At this time, the two gradient free optimizers have not even started to \n", - "make significant progress. Cobyla gets reasonably close to an optimum after about 4k \n", - "evaluations. Neldermead gets stuck after 8k evaluations and fails to solve the problem. \n", + "evaluations. At this point, the two gradient-free optimizers have not yet made \n", + "significant progress. CoByLA gets reasonably close to an optimum after about 4k \n", + "evaluations. Nelder-Mead gets stuck after 8k evaluations and fails to solve the problem. \n", "\n", "This example shows not only that the choice of optimizer is important but that the commonly \n", "held belief that gradient free optimizers are generally more robust than gradient based \n", - "ones is dangerous! The Neldermead algorithm did \"converge\" and reports success, but\n", + "ones is dangerous! The Nelder-Mead algorithm did \"converge\" and reports success, but\n", "did not find the optimum. It did not even get stuck in a local optimum because we know \n", "that the Trid function does not have local optima except the global one. It just got \n", "stuck somewhere. "