Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataframe merge error in python 3 #54

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 43 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ For additional details, please see our
This is not an official Google product.

## What's new
* **30/01/2019:** Dopamine 2.0 now supports general discrete-domain gym
environments.
* **01/11/2018:** Download links for each individual checkpoint, to avoid
having to download all of the checkpoints.
* **29/10/2018:** Graph definitions now show up in Tensorboard.
Expand All @@ -47,7 +49,8 @@ This is not an official Google product.
* Can be enabled via the `double_dqn` constructor parameter.
* **18/09/2018:** Added support for reporting in-iteration losses directly from
the agent to Tensorboard.
* Include the flag `--debug_mode` in your command line to enable it.
* Set the `run_experiment.create_agent.debug_mode = True` via the
configuration file or using the `gin_bindings` flag to enable it.
* Control frequency of writes with the `summary_writing_frequency`
agent constructor parameter (defaults to `500`).
* **27/08/2018:** Dopamine launched!
Expand Down Expand Up @@ -141,18 +144,16 @@ git clone https://github.com/google/dopamine.git
You can test whether the installation was successful by running the following:

```
cd dopamine
export PYTHONPATH=${PYTHONPATH}:.
python tests/atari_init_test.py
python tests/dopamine/atari_init_test.py
```

The entry point to the standard Atari 2600 experiment is
[`dopamine/atari/train.py`](https://github.com/google/dopamine/blob/master/dopamine/atari/train.py).
[`dopamine/discrete_domains/train.py`](https://github.com/google/dopamine/blob/master/dopamine/discrete_domains/train.py).
To run the basic DQN agent,

```
python -um dopamine.atari.train \
--agent_name=dqn \
python -um dopamine.discrete_domains.train \
--base_dir=/tmp/dopamine \
--gin_files='dopamine/agents/dqn/configs/dqn.gin'
```
Expand All @@ -179,6 +180,26 @@ are generated at the end of each iteration.
More generally, the whole of Dopamine is easily configured using the
[gin configuration framework](https://github.com/google/gin-config).

#### Non-Atari discrete environments

We provide sample configuration files for training an agent on Cartpole and
Acrobot. For example, to train C51 on Cartpole with default settings, run the
following command:

```
python -um dopamine.discrete_domains.train \
--base_dir=/tmp/dopamine \
--gin_files='dopamine/agents/rainbow/configs/c51_cartpole.gin'
```

You can train Rainbow on Acrobot with the following command:

```
python -um dopamine.discrete_domains.train \
--base_dir=/tmp/dopamine \
--gin_files='dopamine/agents/rainbow/configs/rainbow_acrobot.gin'
```


### Install as a library
An easy, alternative way to install Dopamine is as a Python library:
Expand Down Expand Up @@ -223,11 +244,22 @@ Conference on Learning Representations, 2016.][prioritized_replay]

### Giving credit

If you use Dopamine in your work, we ask that you cite this repository as a
reference. The preferred format (authors in alphabetical order) is:
If you use Dopamine in your work, we ask that you cite our
[white paper][dopamine_paper]. Here is an example BibTeX entry:

Marc G. Bellemare, Pablo Samuel Castro, Carles Gelada, Saurabh Kumar, Subhodeep Moitra.
Dopamine, https://github.com/google/dopamine, 2018.
```
@article{castro18dopamine,
author = {Pablo Samuel Castro and
Subhodeep Moitra and
Carles Gelada and
Saurabh Kumar and
Marc G. Bellemare},
title = {Dopamine: {A} {R}esearch {F}ramework for {D}eep {R}einforcement {L}earning},
year = {2018},
url = {http://arxiv.org/abs/1812.06110},
archivePrefix = {arXiv}
}
```



Expand All @@ -239,3 +271,4 @@ Dopamine, https://github.com/google/dopamine, 2018.
[c51]: http://proceedings.mlr.press/v70/bellemare17a.html
[rainbow]: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/17204/16680
[iqn]: https://arxiv.org/abs/1806.06923
[dopamine_paper]: https://arxiv.org/abs/1812.06110
3 changes: 3 additions & 0 deletions docs/api_docs/python/_redirects.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
redirects:
- from: /dopamine/dqn_agent/nature_dqn_network
to: /dopamine/atari_lib/nature_dqn_network
40 changes: 34 additions & 6 deletions docs/api_docs/python/_toc.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Automatically generated file; please do not edit
toc:
- title: atari_lib
section:
- title: Overview
path: /dopamine/api_docs/python/atari_lib
- title: AtariPreprocessing
path: /dopamine/api_docs/python/atari_lib/AtariPreprocessing
- title: create_atari_environment
path: /dopamine/api_docs/python/atari_lib/create_atari_environment
- title: implicit_quantile_network
path: /dopamine/api_docs/python/atari_lib/implicit_quantile_network
- title: nature_dqn_network
path: /dopamine/api_docs/python/atari_lib/nature_dqn_network
- title: rainbow_network
path: /dopamine/api_docs/python/atari_lib/rainbow_network
- title: checkpointer
section:
- title: Overview
Expand All @@ -20,6 +34,22 @@ toc:
path: /dopamine/api_docs/python/dqn_agent
- title: DQNAgent
path: /dopamine/api_docs/python/dqn_agent/DQNAgent
- title: gym_lib
section:
- title: Overview
path: /dopamine/api_docs/python/gym_lib
- title: acrobot_dqn_network
path: /dopamine/api_docs/python/gym_lib/acrobot_dqn_network
- title: acrobot_rainbow_network
path: /dopamine/api_docs/python/gym_lib/acrobot_rainbow_network
- title: cartpole_dqn_network
path: /dopamine/api_docs/python/gym_lib/cartpole_dqn_network
- title: cartpole_rainbow_network
path: /dopamine/api_docs/python/gym_lib/cartpole_rainbow_network
- title: create_gym_environment
path: /dopamine/api_docs/python/gym_lib/create_gym_environment
- title: GymPreprocessing
path: /dopamine/api_docs/python/gym_lib/GymPreprocessing
- title: implicit_quantile_agent
section:
- title: Overview
Expand Down Expand Up @@ -58,6 +88,10 @@ toc:
section:
- title: Overview
path: /dopamine/api_docs/python/run_experiment
- title: create_agent
path: /dopamine/api_docs/python/run_experiment/create_agent
- title: create_runner
path: /dopamine/api_docs/python/run_experiment/create_runner
- title: Runner
path: /dopamine/api_docs/python/run_experiment/Runner
- title: TrainRunner
Expand All @@ -66,12 +100,6 @@ toc:
section:
- title: Overview
path: /dopamine/api_docs/python/train
- title: create_agent
path: /dopamine/api_docs/python/train/create_agent
- title: create_runner
path: /dopamine/api_docs/python/train/create_runner
- title: launch_experiment
path: /dopamine/api_docs/python/train/launch_experiment
- title: utils
section:
- title: Overview
Expand Down
32 changes: 32 additions & 0 deletions docs/api_docs/python/atari_lib.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="atari_lib" />
<meta itemprop="path" content="Stable" />
</div>

# Module: atari_lib

Atari-specific utilities including Atari-specific network architectures.

This includes a class implementing minimal Atari 2600 preprocessing, which is in
charge of: . Emitting a terminal signal when losing a life (optional). . Frame
skipping and color pooling. . Resizing the image before it is provided to the
agent.

## Classes

[`class AtariPreprocessing`](./atari_lib/AtariPreprocessing.md): A class
implementing image preprocessing for Atari 2600 agents.

## Functions

[`create_atari_environment(...)`](./atari_lib/create_atari_environment.md):
Wraps an Atari 2600 Gym environment with some basic preprocessing.

[`implicit_quantile_network(...)`](./atari_lib/implicit_quantile_network.md):
The Implicit Quantile ConvNet.

[`nature_dqn_network(...)`](./atari_lib/nature_dqn_network.md): The
convolutional network used to compute the agent's Q-values.

[`rainbow_network(...)`](./atari_lib/rainbow_network.md): The convolutional
network used to compute agent's Q-value distributions.
128 changes: 128 additions & 0 deletions docs/api_docs/python/atari_lib/AtariPreprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="atari_lib.AtariPreprocessing" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="action_space"/>
<meta itemprop="property" content="metadata"/>
<meta itemprop="property" content="observation_space"/>
<meta itemprop="property" content="reward_range"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="render"/>
<meta itemprop="property" content="reset"/>
<meta itemprop="property" content="step"/>
</div>

# atari_lib.AtariPreprocessing

## Class `AtariPreprocessing`

A class implementing image preprocessing for Atari 2600 agents.

Specifically, this provides the following subset from the JAIR paper (Bellemare
et al., 2013) and Nature DQN paper (Mnih et al., 2015):

* Frame skipping (defaults to 4).
* Terminal signal when a life is lost (off by default).
* Grayscale and max-pooling of the last two frames.
* Downsample the screen to a square image (defaults to 84x84).

More generally, this class follows the preprocessing guidelines set down in
Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation
Protocols and Open Problems for General Agents".

<h2 id="__init__"><code>__init__</code></h2>

```python
__init__(
*args,
**kwargs
)
```

Constructor for an Atari 2600 preprocessor.

#### Args:

* <b>`environment`</b>: Gym environment whose observations are preprocessed.
* <b>`frame_skip`</b>: int, the frequency at which the agent experiences the
game.
* <b>`terminal_on_life_loss`</b>: bool, If True, the step() method returns
is_terminal=True whenever a life is lost. See Mnih et al. 2015.
* <b>`screen_size`</b>: int, size of a resized Atari 2600 frame.

#### Raises:

* <b>`ValueError`</b>: if frame_skip or screen_size are not strictly positive.

## Properties

<h3 id="action_space"><code>action_space</code></h3>

<h3 id="metadata"><code>metadata</code></h3>

<h3 id="observation_space"><code>observation_space</code></h3>

<h3 id="reward_range"><code>reward_range</code></h3>

## Methods

<h3 id="render"><code>render</code></h3>

```python
render(mode)
```

Renders the current screen, before preprocessing.

This calls the Gym API's render() method.

#### Args:

* <b>`mode`</b>: Mode argument for the environment's render() method. Valid
values (str) are: 'rgb_array': returns the raw ALE image. 'human': renders
to display via the Gym renderer.

#### Returns:

if mode='rgb_array': numpy array, the most recent screen. if mode='human': bool,
whether the rendering was successful.

<h3 id="reset"><code>reset</code></h3>

```python
reset()
```

Resets the environment.

#### Returns:

* <b>`observation`</b>: numpy array, the initial observation emitted by the
environment.

<h3 id="step"><code>step</code></h3>

```python
step(action)
```

Applies the given action in the environment.

Remarks:

* If a terminal state (from life loss or episode end) is reached, this may
execute fewer than self.frame_skip steps in the environment.
* Furthermore, in this case the returned observation may not contain valid
image data and should be ignored.

#### Args:

* <b>`action`</b>: The action to be executed.

#### Returns:

* <b>`observation`</b>: numpy array, the observation following the action.
* <b>`reward`</b>: float, the reward following the action.
* <b>`is_terminal`</b>: bool, whether the environment has reached a terminal
state. This is true when a life is lost and terminal_on_life_loss, or when
the episode is over.
* <b>`info`</b>: Gym API's info data structure.
38 changes: 38 additions & 0 deletions docs/api_docs/python/atari_lib/create_atari_environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="atari_lib.create_atari_environment" />
<meta itemprop="path" content="Stable" />
</div>

# atari_lib.create_atari_environment

```python
atari_lib.create_atari_environment(
*args,
**kwargs
)
```

Wraps an Atari 2600 Gym environment with some basic preprocessing.

This preprocessing matches the guidelines proposed in Machado et al. (2017),
"Revisiting the Arcade Learning Environment: Evaluation Protocols and Open
Problems for General Agents".

The created environment is the Gym wrapper around the Arcade Learning
Environment.

The main choice available to the user is whether to use sticky actions or not.
Sticky actions, as prescribed by Machado et al., cause actions to persist with
some probability (0.25) when a new command is sent to the ALE. This can be
viewed as introducing a mild form of stochasticity in the environment. We use
them by default.

#### Args:

* <b>`game_name`</b>: str, the name of the Atari 2600 domain.
* <b>`sticky_actions`</b>: bool, whether to use sticky_actions as per Machado
et al.

#### Returns:

An Atari 2600 environment with some standard preprocessing.
Loading