google · Seraphli · Nov 26, 2018 · Nov 29, 2018 · Nov 29, 2018 · Nov 30, 2018
diff --git a/README.md b/README.md
@@ -37,6 +37,8 @@ For additional details, please see our
 This is not an official Google product.
 
 ## What's new
+*  **30/01/2019:** Dopamine 2.0 now supports general discrete-domain gym
+   environments.
 *  **01/11/2018:** Download links for each individual checkpoint, to avoid
    having to download all of the checkpoints.
 *  **29/10/2018:** Graph definitions now show up in Tensorboard.
@@ -47,7 +49,8 @@ This is not an official Google product.
    *  Can be enabled via the `double_dqn` constructor parameter.
 *  **18/09/2018:** Added support for reporting in-iteration losses directly from
    the agent to Tensorboard.
-   *  Include the flag `--debug_mode` in your command line to enable it.
+   *  Set the `run_experiment.create_agent.debug_mode = True` via the
+      configuration file or using the `gin_bindings` flag to enable it.
    *  Control frequency of writes with the `summary_writing_frequency`
       agent constructor parameter (defaults to `500`).
 *  **27/08/2018:** Dopamine launched!
@@ -141,18 +144,16 @@ git clone https://github.com/google/dopamine.git
 You can test whether the installation was successful by running the following:
 
 ```
-cd dopamine
 export PYTHONPATH=${PYTHONPATH}:.
-python tests/atari_init_test.py
+python tests/dopamine/atari_init_test.py
 ```
 
 The entry point to the standard Atari 2600 experiment is
-[`dopamine/atari/train.py`](https://github.com/google/dopamine/blob/master/dopamine/atari/train.py).
+[`dopamine/discrete_domains/train.py`](https://github.com/google/dopamine/blob/master/dopamine/discrete_domains/train.py).
 To run the basic DQN agent,
 
 ```
-python -um dopamine.atari.train \
-  --agent_name=dqn \
+python -um dopamine.discrete_domains.train \
   --base_dir=/tmp/dopamine \
   --gin_files='dopamine/agents/dqn/configs/dqn.gin'
 ```
@@ -179,6 +180,26 @@ are generated at the end of each iteration.
 More generally, the whole of Dopamine is easily configured using the
 [gin configuration framework](https://github.com/google/gin-config).
 
+#### Non-Atari discrete environments
+
+We provide sample configuration files for training an agent on Cartpole and
+Acrobot. For example, to train C51 on Cartpole with default settings, run the
+following command:
+
+```
+python -um dopamine.discrete_domains.train \
+  --base_dir=/tmp/dopamine \
+  --gin_files='dopamine/agents/rainbow/configs/c51_cartpole.gin'
+```
+
+You can train Rainbow on Acrobot with the following command:
+
+```
+python -um dopamine.discrete_domains.train \
+  --base_dir=/tmp/dopamine \
+  --gin_files='dopamine/agents/rainbow/configs/rainbow_acrobot.gin'
+```
+
 
 ### Install as a library
 An easy, alternative way to install Dopamine is as a Python library:
@@ -223,11 +244,22 @@ Conference on Learning Representations, 2016.][prioritized_replay]
 
 ### Giving credit
 
-If you use Dopamine in your work, we ask that you cite this repository as a
-reference. The preferred format (authors in alphabetical order) is:
+If you use Dopamine in your work, we ask that you cite our
+[white paper][dopamine_paper]. Here is an example BibTeX entry:
 
-Marc G. Bellemare, Pablo Samuel Castro, Carles Gelada, Saurabh Kumar, Subhodeep Moitra.
-Dopamine, https://github.com/google/dopamine, 2018.
+```
+@article{castro18dopamine,
+  author    = {Pablo Samuel Castro and
+               Subhodeep Moitra and
+               Carles Gelada and
+               Saurabh Kumar and
+               Marc G. Bellemare},
+  title     = {Dopamine: {A} {R}esearch {F}ramework for {D}eep {R}einforcement {L}earning},
+  year      = {2018},
+  url       = {http://arxiv.org/abs/1812.06110},
+  archivePrefix = {arXiv}
+}
+```
 
 
 
@@ -239,3 +271,4 @@ Dopamine, https://github.com/google/dopamine, 2018.
 [c51]: http://proceedings.mlr.press/v70/bellemare17a.html
 [rainbow]: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/17204/16680
 [iqn]: https://arxiv.org/abs/1806.06923
+[dopamine_paper]: https://arxiv.org/abs/1812.06110
diff --git a/docs/api_docs/python/_redirects.yaml b/docs/api_docs/python/_redirects.yaml
@@ -0,0 +1,3 @@
+redirects:
+- from: /dopamine/dqn_agent/nature_dqn_network
+  to: /dopamine/atari_lib/nature_dqn_network
diff --git a/docs/api_docs/python/_toc.yaml b/docs/api_docs/python/_toc.yaml
@@ -1,5 +1,19 @@
 # Automatically generated file; please do not edit
 toc:
+  - title: atari_lib
+    section:
+    - title: Overview
+      path: /dopamine/api_docs/python/atari_lib
+    - title: AtariPreprocessing
+      path: /dopamine/api_docs/python/atari_lib/AtariPreprocessing
+    - title: create_atari_environment
+      path: /dopamine/api_docs/python/atari_lib/create_atari_environment
+    - title: implicit_quantile_network
+      path: /dopamine/api_docs/python/atari_lib/implicit_quantile_network
+    - title: nature_dqn_network
+      path: /dopamine/api_docs/python/atari_lib/nature_dqn_network
+    - title: rainbow_network
+      path: /dopamine/api_docs/python/atari_lib/rainbow_network
   - title: checkpointer
     section:
     - title: Overview
@@ -20,6 +34,22 @@ toc:
       path: /dopamine/api_docs/python/dqn_agent
     - title: DQNAgent
       path: /dopamine/api_docs/python/dqn_agent/DQNAgent
+  - title: gym_lib
+    section:
+    - title: Overview
+      path: /dopamine/api_docs/python/gym_lib
+    - title: acrobot_dqn_network
+      path: /dopamine/api_docs/python/gym_lib/acrobot_dqn_network
+    - title: acrobot_rainbow_network
+      path: /dopamine/api_docs/python/gym_lib/acrobot_rainbow_network
+    - title: cartpole_dqn_network
+      path: /dopamine/api_docs/python/gym_lib/cartpole_dqn_network
+    - title: cartpole_rainbow_network
+      path: /dopamine/api_docs/python/gym_lib/cartpole_rainbow_network
+    - title: create_gym_environment
+      path: /dopamine/api_docs/python/gym_lib/create_gym_environment
+    - title: GymPreprocessing
+      path: /dopamine/api_docs/python/gym_lib/GymPreprocessing
   - title: implicit_quantile_agent
     section:
     - title: Overview
@@ -58,6 +88,10 @@ toc:
     section:
     - title: Overview
       path: /dopamine/api_docs/python/run_experiment
+    - title: create_agent
+      path: /dopamine/api_docs/python/run_experiment/create_agent
+    - title: create_runner
+      path: /dopamine/api_docs/python/run_experiment/create_runner
     - title: Runner
       path: /dopamine/api_docs/python/run_experiment/Runner
     - title: TrainRunner
@@ -66,12 +100,6 @@ toc:
     section:
     - title: Overview
       path: /dopamine/api_docs/python/train
-    - title: create_agent
-      path: /dopamine/api_docs/python/train/create_agent
-    - title: create_runner
-      path: /dopamine/api_docs/python/train/create_runner
-    - title: launch_experiment
-      path: /dopamine/api_docs/python/train/launch_experiment
   - title: utils
     section:
     - title: Overview

diff --git a/docs/api_docs/python/atari_lib.md b/docs/api_docs/python/atari_lib.md
@@ -0,0 +1,32 @@
+<div itemscope itemtype="http://developers.google.com/ReferenceObject">
+<meta itemprop="name" content="atari_lib" />
+<meta itemprop="path" content="Stable" />
+</div>
+
+# Module: atari_lib
+
+Atari-specific utilities including Atari-specific network architectures.
+
+This includes a class implementing minimal Atari 2600 preprocessing, which is in
+charge of: . Emitting a terminal signal when losing a life (optional). . Frame
+skipping and color pooling. . Resizing the image before it is provided to the
+agent.
+
+## Classes
+
+[`class AtariPreprocessing`](./atari_lib/AtariPreprocessing.md): A class
+implementing image preprocessing for Atari 2600 agents.
+
+## Functions
+
+[`create_atari_environment(...)`](./atari_lib/create_atari_environment.md):
+Wraps an Atari 2600 Gym environment with some basic preprocessing.
+
+[`implicit_quantile_network(...)`](./atari_lib/implicit_quantile_network.md):
+The Implicit Quantile ConvNet.
+
+[`nature_dqn_network(...)`](./atari_lib/nature_dqn_network.md): The
+convolutional network used to compute the agent's Q-values.
+
+[`rainbow_network(...)`](./atari_lib/rainbow_network.md): The convolutional
+network used to compute agent's Q-value distributions.
diff --git a/docs/api_docs/python/atari_lib/AtariPreprocessing.md b/docs/api_docs/python/atari_lib/AtariPreprocessing.md
@@ -0,0 +1,128 @@
+<div itemscope itemtype="http://developers.google.com/ReferenceObject">
+<meta itemprop="name" content="atari_lib.AtariPreprocessing" />
+<meta itemprop="path" content="Stable" />
+<meta itemprop="property" content="action_space"/>
+<meta itemprop="property" content="metadata"/>
+<meta itemprop="property" content="observation_space"/>
+<meta itemprop="property" content="reward_range"/>
+<meta itemprop="property" content="__init__"/>
+<meta itemprop="property" content="render"/>
+<meta itemprop="property" content="reset"/>
+<meta itemprop="property" content="step"/>
+</div>
+
+# atari_lib.AtariPreprocessing
+
+## Class `AtariPreprocessing`
+
+A class implementing image preprocessing for Atari 2600 agents.
+
+Specifically, this provides the following subset from the JAIR paper (Bellemare
+et al., 2013) and Nature DQN paper (Mnih et al., 2015):
+
+*   Frame skipping (defaults to 4).
+*   Terminal signal when a life is lost (off by default).
+*   Grayscale and max-pooling of the last two frames.
+*   Downsample the screen to a square image (defaults to 84x84).
+
+More generally, this class follows the preprocessing guidelines set down in
+Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation
+Protocols and Open Problems for General Agents".
+
+<h2 id="__init__"><code>__init__</code></h2>
+
+```python
+__init__(
+    *args,
+    **kwargs
+)
+```
+
+Constructor for an Atari 2600 preprocessor.
+
+#### Args:
+
+*   <b>`environment`</b>: Gym environment whose observations are preprocessed.
+*   <b>`frame_skip`</b>: int, the frequency at which the agent experiences the
+    game.
+*   <b>`terminal_on_life_loss`</b>: bool, If True, the step() method returns
+    is_terminal=True whenever a life is lost. See Mnih et al. 2015.
+*   <b>`screen_size`</b>: int, size of a resized Atari 2600 frame.
+
+#### Raises:
+
+*   <b>`ValueError`</b>: if frame_skip or screen_size are not strictly positive.
+
+## Properties
+
+<h3 id="action_space"><code>action_space</code></h3>
+
+<h3 id="metadata"><code>metadata</code></h3>
+
+<h3 id="observation_space"><code>observation_space</code></h3>
+
+<h3 id="reward_range"><code>reward_range</code></h3>
+
+## Methods
+
+<h3 id="render"><code>render</code></h3>
+
+```python
+render(mode)
+```
+
+Renders the current screen, before preprocessing.
+
+This calls the Gym API's render() method.
+
+#### Args:
+
+*   <b>`mode`</b>: Mode argument for the environment's render() method. Valid
+    values (str) are: 'rgb_array': returns the raw ALE image. 'human': renders
+    to display via the Gym renderer.
+
+#### Returns:
+
+if mode='rgb_array': numpy array, the most recent screen. if mode='human': bool,
+whether the rendering was successful.
+
+<h3 id="reset"><code>reset</code></h3>
+
+```python
+reset()
+```
+
+Resets the environment.
+
+#### Returns:
+
+*   <b>`observation`</b>: numpy array, the initial observation emitted by the
+    environment.
+
+<h3 id="step"><code>step</code></h3>
+
+```python
+step(action)
+```
+
+Applies the given action in the environment.
+
+Remarks:
+
+*   If a terminal state (from life loss or episode end) is reached, this may
+    execute fewer than self.frame_skip steps in the environment.
+*   Furthermore, in this case the returned observation may not contain valid
+    image data and should be ignored.
+
+#### Args:
+
+*   <b>`action`</b>: The action to be executed.
+
+#### Returns:
+
+*   <b>`observation`</b>: numpy array, the observation following the action.
+*   <b>`reward`</b>: float, the reward following the action.
+*   <b>`is_terminal`</b>: bool, whether the environment has reached a terminal
+    state. This is true when a life is lost and terminal_on_life_loss, or when
+    the episode is over.
+*   <b>`info`</b>: Gym API's info data structure.
diff --git a/docs/api_docs/python/atari_lib/create_atari_environment.md b/docs/api_docs/python/atari_lib/create_atari_environment.md
@@ -0,0 +1,38 @@
+<div itemscope itemtype="http://developers.google.com/ReferenceObject">
+<meta itemprop="name" content="atari_lib.create_atari_environment" />
+<meta itemprop="path" content="Stable" />
+</div>
+
+# atari_lib.create_atari_environment
+
+```python
+atari_lib.create_atari_environment(
+    *args,
+    **kwargs
+)
+```
+
+Wraps an Atari 2600 Gym environment with some basic preprocessing.
+
+This preprocessing matches the guidelines proposed in Machado et al. (2017),
+"Revisiting the Arcade Learning Environment: Evaluation Protocols and Open
+Problems for General Agents".
+
+The created environment is the Gym wrapper around the Arcade Learning
+Environment.
+
+The main choice available to the user is whether to use sticky actions or not.
+Sticky actions, as prescribed by Machado et al., cause actions to persist with
+some probability (0.25) when a new command is sent to the ALE. This can be
+viewed as introducing a mild form of stochasticity in the environment. We use
+them by default.
+
+#### Args:
+
+*   <b>`game_name`</b>: str, the name of the Atari 2600 domain.
+*   <b>`sticky_actions`</b>: bool, whether to use sticky_actions as per Machado
+    et al.
+
+#### Returns:
+
+An Atari 2600 environment with some standard preprocessing.