Skip to content

Commit

Permalink
ICLR release.
Browse files Browse the repository at this point in the history
  • Loading branch information
brendenpetersen committed Jan 14, 2021
0 parents commit 5777861
Show file tree
Hide file tree
Showing 40 changed files with 5,955 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
*.DS_Store
*.pyc
*.egg*
venv*
dsr/dsr/summary*
*log_*
.gitignore
.ipynb_checkpoints
~$*
*.vscode/
dsr/build
dsr/dsr/cyfunc*
**/log/
30 changes: 30 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
BSD 3-Clause License

Copyright (c) 2018, Lawrence Livermore National Security, LLC
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

21 changes: 21 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.

This work was prepared as an account of work sponsored by an agency of
the United States Government. Neither the United States Government nor
Lawrence Livermore National Security, LLC, nor any of their employees
makes any warranty, expressed or implied, or assumes any legal liability
or responsibility for the accuracy, completeness, or usefulness of any
information, apparatus, product, or process disclosed, or represents that
its use would not infringe privately owned rights.

Reference herein to any specific commercial product, process, or service
by trade name, trademark, manufacturer, or otherwise does not necessarily
constitute or imply its endorsement, recommendation, or favoring by the
United States Government or Lawrence Livermore National Security, LLC.

The views and opinions of authors expressed herein do not necessarily
state or reflect those of the United States Government or Lawrence
Livermore National Security, LLC, and shall not be used for advertising
or product endorsement purposes.
134 changes: 134 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Deep symbolic regression

Deep symbolic regression (DSR) is a deep learning algorithm for symbolic regression--the task of recovering tractable mathematical expressions from an input dataset. The package `dsr` contains the code for DSR, including a single-point, parallelized launch script (`dsr/run.py`), baseline genetic programming-based symbolic regression algorithm, and an sklearn-like interface for use with your own data.

This code supports the ICLR 2021 paper [Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients](https://openreview.net/forum?id=m5Qsh0kBQG).

# Installation

Installation is straightforward in a Python 3 virtual environment using Pip. From the repository root:

```
python3 -m venv venv3 # Create a Python 3 virtual environment
source venv3/bin/activate # Activate the virtual environmnet
pip install -r requirements.txt # Install Python dependencies
export CFLAGS="-I $(python -c "import numpy; print(numpy.get_include())") $CFLAGS" # Needed on Mac to prevent fatal error: 'numpy/arrayobject.h' file not found
pip install -e ./dsr # Install DSR package
```

To perform experiments involving the GP baseline, you will need the additional package `deap`.

# Example usage

To try out DSR, use the following command from the repository root:

```
python -m dsr.run ./dsr/dsr/config.json --b=Nguyen-6
```

This should solve in around 50 training steps (~30 seconds on a laptop).

# Getting started

## Configuring runs

DSR uses JSON files to configure training.

Top-level key "task" specifies details of the benchmark expression for DSR or GP. See docs in `regression.py` for details.

Top-level key "training" specifies the training hyperparameters for DSR. See docs in `train.py` for details.

Top-level key "controller" specifies the RNN controller hyperparameters for DSR. See docs for in `controller.py` for details.

Top-level key "gp" specifies the hyperparameters for GP if using the GP baseline. See docs for `dsr.baselines.gspr.GP` for details.

## Launching runs

After configuring a run, launching it is simple:

```
python -m dsr.run [PATH_TO_CONFIG] [--OPTIONS]
```

## Sklearn interface

DSR also provides an [sklearn-like regressor interface](https://scikit-learn.org/stable/modules/generated/sklearn.base.RegressorMixin.html). Example usage:

```
from dsr import DeepSymbolicRegressor
import numpy as np
# Generate some data
np.random.seed(0)
X = np.random.random((10, 2))
y = np.sin(X[:,0]) + X[:,1] ** 2
# Create the model
model = DeepSymbolicRegressor("config.json")
# Fit the model
model.fit(X, y) # Should solve in ~10 seconds
# View the best expression
print(model.program_.pretty())
# Make predictions
model.predict(2 * X)
```

## Using an external dataset

To use your own dataset, simply provide the path to the `"dataset"` key in the config, and give your task an arbitary name.

```
"task": {
"task_type": "regression",
"name": "my_task",
"dataset": "./path/to/my_dataset.csv",
...
}
```

Then run DSR:

```
python -m dsr.run path/to/config.json
```

Note the `--b` flag matches the name of the CSV file (-`.csv` ).

## Command-line examples

Show command-line help and quit

```
python -m dsr.run --help
```

Train 2 indepdent runs of DSR on the Nguyen-1 benchmark using 2 cores

```
python -m dsr.run config.json --b=Nguyen-1 --mc=2 --num_cores=2
```

Train DSR on all 12 Nguyen benchmarks using 12 cores

```
python -m dsr.run config.json --b=Nguyen --num_cores=12
```

Train 2 independent runs of GP on Nguyen-1

```
python -m dsr.run config.json --method=gp --b=Nguyen-1 --mc=2 --num_cores=2
```

Train DSR on Nguyen-1 and Nguyen-4

```
python -m dsr.run config.json --b=Nguyen-1 --b=Nguyen-4
```

# Release

LLNL-CODE-647188
3 changes: 3 additions & 0 deletions dsr/dsr/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from dsr.core import DeepSymbolicOptimizer
from dsr.task.regression.sklearn import DeepSymbolicRegressor

Empty file added dsr/dsr/baselines/__init__.py
Empty file.
128 changes: 128 additions & 0 deletions dsr/dsr/baselines/constraints.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""Defines constraints for GP individuals, to be used as decorators for
evolutionary operations."""

from dsr.functions import UNARY_TOKENS, BINARY_TOKENS

TRIG_TOKENS = ["sin", "cos", "tan", "csc", "sec", "cot"]

# Define inverse tokens
INVERSE_TOKENS = {
"exp" : "log",
"neg" : "neg",
"inv" : "inv",
"sqrt" : "n2"
}

# Add inverse trig functions
INVERSE_TOKENS.update({
t : "arc" + t for t in TRIG_TOKENS
})

# Add reverse
INVERSE_TOKENS.update({
v : k for k, v in INVERSE_TOKENS.items()
})

DEBUG = False


def check_inv(ind):
"""Returns True if two sequential tokens are inverse unary operators."""

names = [node.name for node in ind]
for i, name in enumerate(names[:-1]):
if name in INVERSE_TOKENS and names[i+1] == INVERSE_TOKENS[name]:
if DEBUG:
print("Constrained inverse:", ind)
return True
return False


def check_const(ind):
"""Returns True if children of a parent are all const tokens."""

names = [node.name for node in ind]
for i, name in enumerate(names):
if name in UNARY_TOKENS and names[i+1] == "const":
if DEBUG:
print("Constrained const (unary)", ind)
return True
if name in BINARY_TOKENS and names[i+1] == "const" and names[i+1] == "const":
if DEBUG:
print(print("Constrained const (binary)", ind))
return True
return False


def check_trig(ind):
"""Returns True if a descendant of a trig operator is another trig
operator."""

names = [node.name for node in ind]
trig_descendant = False # True when current node is a descendant of a trig operator
trig_dangling = None # Number of unselected nodes in trig subtree
for i, name in enumerate(names):
if name in TRIG_TOKENS:
if trig_descendant:
if DEBUG:
print("Constrained trig:", ind)
return True
trig_descendant = True
trig_dangling = 1
elif trig_descendant:
if name in BINARY_TOKENS:
trig_dangling += 1
elif name not in UNARY_TOKENS:
trig_dangling -= 1
if trig_dangling == 0:
trig_descendant = False
return False


def make_check_min_len(min_length):
"""Creates closure for minimum length constraint"""

def check_min_len(ind):
"""Returns True if individual is less than minimum length"""

if len(ind) < min_length:
if DEBUG:
print("Constrained min len: {} (length {})".format(ind, len(ind)))
return True

return False

return check_min_len


def make_check_max_len(max_length):
"""Creates closure for maximum length constraint"""

def check_max_len(ind):
"""Returns True if individual is greater than maximum length"""

if len(ind) > max_length:
if DEBUG:
print("Constrained max len: {} (length {})".format(ind, len(ind)))
return True

return False

return check_max_len


def make_check_num_const(max_const):
"""Creates closure for maximum number of constants constraint"""

def check_num_const(ind):
"""Returns True if individual has more than max_const const tokens"""

num_const = len([t for t in ind if t.name == "const"])
if num_const > max_const:
if DEBUG:
print("Constrained max const: {} ({} consts)".format(ind, num_const))
return True

return False

return check_num_const
Loading

0 comments on commit 5777861

Please sign in to comment.