Docs (#68)

sustainable-processes · Sep 10, 2020 · e9dc6ca · e9dc6ca
1 parent 8d64f6f
commit e9dc6ca
Show file tree

Hide file tree

Showing 41 changed files with 1,874 additions and 1,435 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,67 @@
+# Contributing
+
+Some instructions for people contributing back.
+
+### Downloading the code
+
+1. Clone the repository:
+```git clone https://github.com/sustainable-processes/summit_private.git```
+2. Intall poetry by following the instructions [here](https://python-poetry.org/docs/#installation). We use poetry for dependency management.
+3. Install all dependencies:
+```poetry install```
+3. To run tests:
+```poetry run pytest --doctest-modules --ignore=case_studies```
+
+### Commit Worfklow
+
+- Use the [project board](https://github.com/orgs/sustainable-processes/projects/1) to keep track of issues. Issues will automatically be moved along in the board when they are closed in Github.
+- Write tests in the tests/ folder
+- Documentation follows the [numpy docstring format](https://numpydoc.readthedocs.io/en/latest/format.html#documenting-class-instances)
+    - Please include examples when possible that can be tested using [doctest](https://docs.python.org/3/library/doctest.html)
+    - All publicly available classes and methods should have a docstring
+- Commit to a branch off master and submit pull requests to merge. 
+    - To create a branch locally and push it:
+    ```bash
+    $ git checkout -b BRANCH_NAME
+    # Once you've made some changes
+    $ git commit -am "commit message"
+    $ git push -u origin BRANCH_NAME
+    #Now if you come back to Github, your branch should exist
+    ```
+    - All pull requests need one review.
+    - Tests will be run automatically when a pull request is created, and all tests need to pass before the pull request is merged. 
+
+### Docker
+Sometimes, it is easier to run tests using a Docker container (e.g., on compute clusters). Here are the commands to build and run the docker containers using the included Dockferfile. The container entrypoint is python, so you just need to specify the file name.
+
+To build the container and upload the container to Docker Hub.:
+```
+docker build . -t marcosfelt/summit:latest
+docker push marcosfelt/summit:latest
+```
+You can change the tag from `latest` to whatever is most appropriate (e.g., the branch name). I have found that this takes up a lot of space on disk, so I have been running the commands on our private servers.
+
+Then, to run a container, here is an example with the SnAr experiment code. The home directory of the container is called `summit_user`, hence we mount the current working directory into that folder.  We remove the container upon finishing using `--rm` and make it interactive using `--it` (remove this if you just want the container to run in the background). [Neptune.ai](https://neptune.ai/) is used for the experiments so the API token is passed in. Finally, I specify the image name and the tag and before referencing the python file I want to run. 
+
+```
+export NEPTUNE_API_TOKEN= #place your neptune token here
+sudo docker run -v `pwd`/:/summit_user --rm -it --env NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN summit:snar_benchmark snar_experiment_2.py
+```
+
+Singularity (for running Docker containers on the HPC):
+```
+export NEPTUNE_API_TOKEN=
+singularity exec -B `pwd`/:/summit_user docker://marcosfelt/summit:snar_benchmark snar_experiment.py
+```
+
+### Releases
+
+Below is the old process for building a release. In the future, we will have this automated using Github actions.
+
+1. Install [s3pypi](https://github.com/novemberfiveco/s3pypi) and [dephell](https://dephell.org/docs/installation.html)
+2. Install AWS credentials to upload pypi.rxns.io (Kobi is the one who controls this).
+3. Bump the version in pyproject.toml and then run:
+    ```dephell deps convert --from=pyproject.toml --to=setup.py```
+4. Go into setup.py and delete the lines for extras_install_requires
+4. Upload the package to the private pypi repository:
+    ```s3pypi --bucket pypi.rxns.io``
diff --git a/README.md b/README.md
@@ -1,86 +1,70 @@
 # Summit
+![summit_banner](docs/source/_static/banner_4.png)
 
 Summit is a set of tools for optimising chemical processes. We’ve started by targeting reactions.
 
-Currently, reaction optimisation in the fine chemicals industry is done by intuition or design of experiments, which both scale poorly with the complexity of the problem. Summit applies recent advances in machine learning to make the process of reaction optimisation faster. Essentially, it applies algorithms that learn which conditions (e.g., temperature, stoichiometry, etc.) are important to maximising one or more objectives (e.g., yield, enantiomeric excess). This is achieved through an iterative cycle.
+## What is Summit?
+Currently, reaction optimisation in the fine chemicals industry is done by intuition or design of experiments,  Both scale poorly with the complexity of the problem. 
 
-For a more academic treatment of Summit, check out “Benchmarking Machine Learning for Reaction Optimisation.” If you just want to try it, out, check out our [tutorial](https://gosummit.readthedocs.io/en/latest/tutorial.html).
+Summit uses recent advances in machine learning to make the process of reaction optimisation faster. Essentially, it applies algorithms that learn which conditions (e.g., temperature, stoichiometry, etc.) are important to maximising one or more objectives (e.g., yield, enantiomeric excess). This is achieved through an iterative cycle.
+
+Summit has two key features:
+
+- **Strategies**: Optimisation algorithms designed to find the best conditions with the least number of iterations. Summit has eight strategies implemented.
+- **Benchmarks**: Simulations of chemical reactions that can be used to test strategies. We have both mechanistic and data-driven benchmarks.
+
+To get started, see the Quick Start below or follow our [tutorial](https://gosummit.readthedocs.io/en/latest/tutorial.html). 
+
+Currently, Summit has the following strategies implemented:
+
+- **TSEMO**: Multi-objective Bayesian optimisation strategy by [Bradford et al.]()
+- **Gryffin**: Single-objective Bayesian optimisation strategy designed for categoical variables [Häse et al.](https://arxiv.org/abs/2003.12127)
+- **SOBO**: Single-objective Bayesian optimisation strategy ([GpyOpt](https://gpyopt.readthedocs.io/))
+- **Nelder-Mead**: Single-objective optimisation stategy for local search
+- **SNOBFIT**: Single-objective optimisation strategy by [Huyer et al.](https://www.mat.univie.ac.at/~neum/ms/snobfit.pdf)
+- **Deep Raction Optimiser**: Deep reinforcement learning by [Zhou et al.](https://pubs.acs.org/doi/10.1021/acscentsci.7b00492)
+- **Factorial DoE**: Factorial design of experiments
+- **Random**: Random search
 
 ## Installation
 
+To install summit, use the following command:
+
 ```pip install git+https://github.com/sustainable-processes/summit.git@0.5.0#egg=summit```
 
-## Documentation
+## Quick Start
 
-The documentation for summit can be found [here](https://gosummit.readthedocs.io/en/latest/index.html).
-<!-- It would be great to add a "Quick Start" here.-->
-
-## Development
-
-Some instructions for people contributing back.
-
-### Downloading the code
-
-1. Clone the repository:
-```git clone https://github.com/sustainable-processes/summit_private.git```
-2. Intall poetry by following the instructions [here](https://python-poetry.org/docs/#installation). We use poetry for dependency management.
-3. Install all dependencies:
-```poetry install```
-3. To run tests:
-```poetry run pytest --doctest-modules --ignore=case_studies```
-
-### Commit Worfklow
-
-- Use the [project board](https://github.com/orgs/sustainable-processes/projects/1) to keep track of issues. Issues will automatically be moved along in the board when they are closed in Github.
-- Write tests in the tests/ folder
-- Documentation follows the [numpy docstring format](https://numpydoc.readthedocs.io/en/latest/format.html#documenting-class-instances)
-    - Please include examples when possible that can be tested using [doctest](https://docs.python.org/3/library/doctest.html)
-    - All publicly available classes and methods should have a docstring
-- Commit to a branch off master and submit pull requests to merge. 
-    - To create a branch locally and push it:
-    ```bash
-    $ git checkout -b BRANCH_NAME
-    # Once you've made some changes
-    $ git commit -am "commit message"
-    $ git push -u origin BRANCH_NAME
-    #Now if you come back to Github, your branch should exist
-    ```
-    - All pull requests need one review.
-    - Tests will be run automatically when a pull request is created, and all tests need to pass before the pull request is merged. 
-
-### Docker
-Sometimes, it is easier to run tests using a Docker container (e.g., on compute clusters). Here are the commands to build and run the docker containers using the included Dockferfile. The container entrypoint is python, so you just need to specify the file name.
-
-To build the container and upload the container to Docker Hub.:
-```
-docker build . -t marcosfelt/summit:latest
-docker push marcosfelt/summit:latest
-```
-You can change the tag from `latest` to whatever is most appropriate (e.g., the branch name). I have found that this takes up a lot of space on disk, so I have been running the commands on our private servers.
+Below, we show how to use the Nelder-Mead  strategy  to optimise a benchmark representing a nucleophlic aromatic substitution (SnAr) reaction.
+```python
+# Import summit
+from summit.benchmarks import SnarBenchmark, MultitoSingleObjective
+from summit.strategies import NelderMead
+from summit.run import Runner
 
-Then, to run a container, here is an example with the SnAr experiment code. The home directory of the container is called `summit_user`, hence we mount the current working directory into that folder.  We remove the container upon finishing using `--rm` and make it interactive using `--it` (remove this if you just want the container to run in the background). [Neptune.ai](https://neptune.ai/) is used for the experiments so the API token is passed in. Finally, I specify the image name and the tag and before referencing the python file I want to run. 
+# Instantiate the benchmark
+exp = SnarBenchmark()
 
-```
-export NEPTUNE_API_TOKEN= #place your neptune token here
-sudo docker run -v `pwd`/:/summit_user --rm -it --env NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN summit:snar_benchmark snar_experiment_2.py
-```
+# Since the Snar benchmark has two objectives and Nelder-Mead is single objective, we need a multi-to-single objective transform
+transform = MultitoSingleObjective(
+    exp.domain, expression="-sty/1e4+e_factor/100", maximize=False
+)
 
-Singularity (for running Docker containers on the HPC):
-```
-export NEPTUNE_API_TOKEN=
-singularity exec -B `pwd`/:/summit_user docker://marcosfelt/summit:snar_benchmark snar_experiment.py
+# Set up the strategy, passing in the optimisation domain and transform
+nm = NelderMead(exp.domain, transform=transform)
+
+# Use the runner to run closed loop experiments
+r = Runner(
+    strategy=nm, experiment=exp,max_iterations=50
+)
+r.run()
 ```
 
-### Releases
+## Documentation
+
+The documentation for summit can be found [here](https://gosummit.readthedocs.io/en/latest/index.html).
 
-Below is the old process for building a release. In the future, we will have this automated using Github actions.
 
-1. Install [s3pypi](https://github.com/novemberfiveco/s3pypi) and [dephell](https://dephell.org/docs/installation.html)
-2. Install AWS credentials to upload pypi.rxns.io (Kobi is the one who controls this).
-3. Bump the version in pyproject.toml and then run:
-    ```dephell deps convert --from=pyproject.toml --to=setup.py```
-4. Go into setup.py and delete the lines for extras_install_requires
-4. Upload the package to the private pypi repository:
-    ```s3pypi --bucket pypi.rxns.io```
+## Issues?
+Submit an [issue](https://github.com/sustainable-processes/summit/issues) or send an email to kcmf2@cam.ac.uk.
 
 
diff --git a/docs/source/_static/TSEMO_DTLZ2.png b/docs/source/_static/TSEMO_DTLZ2.png
diff --git a/docs/source/_static/acquistion_function.png b/docs/source/_static/acquistion_function.png
diff --git a/docs/source/_static/banner_4.png b/docs/source/_static/banner_4.png
diff --git a/docs/source/_static/snar_experiments_external_0.csv b/docs/source/_static/snar_experiments_external_0.csv
@@ -1,7 +1,4 @@
 NAME,tau,equiv_pldn,conc_dfnb,temperature,strategy
 TYPE,DATA,DATA,DATA,DATA,METADATA
-0,0.65,2.2,0.4600000000000001,75.0,LHS
-1,0.9500000000000001,3.0,0.38,111.0,LHS
-2,1.25,4.6,0.14,57.0,LHS
-3,1.85,3.8000000000000003,0.30000000000000004,39.0,LHS
-4,1.55,1.4,0.22000000000000003,93.0,LHS
+0,1.697276616717274,2.3766612355128656,0.49423078653350516,81.35696424017263,Single-objective BayOpt
+1,1.9349141004941464,3.7215572584228926,0.2886673868745884,33.0546657397271,Single-objective BayOpt
diff --git a/docs/source/_static/snar_experiments_external_1.csv b/docs/source/_static/snar_experiments_external_1.csv
@@ -1,7 +1,4 @@
 NAME,tau,equiv_pldn,conc_dfnb,temperature,sty,e_factor,computation_t,experiment_t,strategy
 TYPE,DATA,DATA,DATA,DATA,DATA,DATA,METADATA,METADATA,METADATA
-0,0.65,2.2,0.4600000000000001,75.0,7786.655447945639,10.987035663168497,0.0,0.01232290267944336,LHS
-1,0.9500000000000001,3.0,0.38,111.0,2887.7432355153373,20.60376622418164,0.0,0.011729001998901367,LHS
-2,1.25,4.6,0.14,57.0,1249.553463058851,32.831486727082876,0.0,0.009754657745361328,LHS
-3,1.85,3.8000000000000003,0.30000000000000004,39.0,1796.9127067316595,16.483119872945093,0.0,0.011143207550048828,LHS
-4,1.55,1.4,0.22000000000000003,93.0,1595.8336602961792,20.411290438847523,0.0,0.009853839874267578,LHS
+0,1.697276616717274,2.3766612355128656,0.49423078653350516,81.35696424017263,2577.8678212410014,13.11795197598188,0.0,0.01884603500366211,Single-objective BayOpt
+1,1.9349141004941464,3.7215572584228926,0.2886673868745884,33.0546657397271,1694.334796274914,16.58069255511405,0.0,0.015223026275634766,Single-objective BayOpt
diff --git a/docs/source/_static/snar_experiments_external_2.csv b/docs/source/_static/snar_experiments_external_2.csv
@@ -1,7 +1,3 @@
-NAME,tau,equiv_pldn,conc_dfnb,temperature,sty,e_factor,strategy
-TYPE,DATA,DATA,DATA,DATA,DATA,DATA,METADATA
-418,0.5030621661413801,2.159430271452188,0.3331364369939005,39.349316576589885,8571.64186963127,20.263331718101174,TSEMO2
-417,1.9427586165924804,2.4385333547452435,0.21844473638973613,40.65833629490961,1900.7421954026868,20.262233951542083,TSEMO2
-340,0.9372710173172387,2.5229832750502386,0.4554495530196181,102.96622111313425,3068.1191873838484,20.300237076480652,TSEMO2
-896,1.2733780600751703,4.840804510690066,0.21208537109030945,68.18683968991941,1260.200774131921,21.023509554987164,TSEMO2
-506,0.9155072752567002,3.835044815518099,0.48601776052851564,109.58468407825623,3397.9204691962477,20.275176634930965,TSEMO2
+NAME,tau,equiv_pldn,conc_dfnb,temperature,strategy
+TYPE,DATA,DATA,DATA,DATA,METADATA
+0,1.122318954003142,3.164344834741235,0.23598966596928997,34.44861796011246,Single-objective BayOpt
diff --git a/docs/source/_static/snar_sobo_external.json b/docs/source/_static/snar_sobo_external.json
@@ -0,0 +1 @@
+{"name": "SOBO", "transform": {"transform_domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "scalar_objective", "description": "-sty/1e4+e_factor/100", "units": null, "bounds": [0.0, 1.0]}], "name": "MultitoSingleObjective", "domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "sty", "description": "space time yield (kg/m^3/h)", "units": null, "bounds": [0.0, 100.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "e_factor", "description": "E-factor", "units": null, "bounds": [0.0, 10.0]}], "transform_params": {"expression": "-sty/1e4+e_factor/100", "maximize": false}}, "strategy_params": {"prev_param": null, "use_descriptors": false, "gp_model_type": "GP", "acquisition_type": "EI", "optimizer_type": "lbfgs", "evaluator_type": "random", "kernel": {"input_dim": 4, "active_dims": [0, 1, 2, 3], "name": "Mat52", "useGPU": false, "variance": [1.0], "lengthscale": [1.0], "ARD": false, "class": "GPy.kern.Matern52"}, "exact_feval": false, "ARD": true, "standardize_outputs": true}}
diff --git a/docs/source/_static/snar_sobo_external_2.json b/docs/source/_static/snar_sobo_external_2.json
@@ -0,0 +1 @@
+{"name": "SOBO", "transform": {"transform_domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "scalar_objective", "description": "-sty/1e4+e_factor/100", "units": null, "bounds": [0.0, 1.0]}], "name": "MultitoSingleObjective", "domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "sty", "description": "space time yield (kg/m^3/h)", "units": null, "bounds": [0.0, 100.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "e_factor", "description": "E-factor", "units": null, "bounds": [0.0, 10.0]}], "transform_params": {"expression": "-sty/1e4+e_factor/100", "maximize": false}}, "strategy_params": {"prev_param": [[[1.6972766167172741, 2.3766612355128656, 0.4942307865335052, 81.35696424017263], [1.9349141004941464, 3.7215572584228926, 0.2886673868745884, 33.0546657397271]], [[0.1266072623642813], [0.003626554076350874]]], "use_descriptors": false, "gp_model_type": "GP", "acquisition_type": "EI", "optimizer_type": "lbfgs", "evaluator_type": "random", "kernel": {"input_dim": 4, "active_dims": [0, 1, 2, 3], "name": "Mat52", "useGPU": false, "variance": [0.9996612796837149], "lengthscale": [2.246874970301238], "ARD": false, "class": "GPy.kern.Matern52"}, "exact_feval": false, "ARD": true, "standardize_outputs": true}}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"name": "SOBO", "transform": {"transform_domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "scalar_objective", "description": "-sty/1e4+e_factor/100", "units": null, "bounds": [0.0, 1.0]}], "name": "MultitoSingleObjective", "domain": [{"type": "ContinuousVariable", "is_objective": false, "name": "tau", "description": "residence time in minutes", "units": null, "bounds": [0.5, 2.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "equiv_pldn", "description": "equivalents of pyrrolidine", "units": null, "bounds": [1.0, 5.0]}, {"type": "ContinuousVariable", "is_objective": false, "name": "conc_dfnb", "description": "concentration of 2,4 dinitrofluorobenenze at reactor inlet (after mixing) in M", "units": null, "bounds": [0.1, 0.5]}, {"type": "ContinuousVariable", "is_objective": false, "name": "temperature", "description": "Reactor temperature in degress celsius", "units": null, "bounds": [30.0, 120.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "sty", "description": "space time yield (kg/m^3/h)", "units": null, "bounds": [0.0, 100.0]}, {"type": "ContinuousVariable", "is_objective": true, "name": "e_factor", "description": "E-factor", "units": null, "bounds": [0.0, 10.0]}], "transform_params": {"expression": "-sty/1e4+e_factor/100", "maximize": false}}, "strategy_params": {"prev_param": null, "use_descriptors": false, "gp_model_type": "GP", "acquisition_type": "EI", "optimizer_type": "lbfgs", "evaluator_type": "random", "kernel": {"input_dim": 4, "active_dims": [0, 1, 2, 3], "name": "Mat52", "useGPU": false, "variance": [1.0], "lengthscale": [1.0], "ARD": false, "class": "GPy.kern.Matern52"}, "exact_feval": false, "ARD": true, "standardize_outputs": true}}