Project import generated by Copybara.

PiperOrigin-RevId: 689864943
google · Oct 25, 2024 · 9f616b9 · 9f616b9
1 parent 4552f69
commit 9f616b9
Show file tree

Hide file tree

Showing 609 changed files with 3,015,898 additions and 783 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -10,24 +10,6 @@
 
 # Pull Requests
 
-We'd love to accept your patches and contributions to this project. There are
-just a few small guidelines you need to follow.
-
-## Contributor License Agreement
-
-Contributions to this project must be accompanied by a Contributor License
-Agreement. You (or your employer) retain the copyright to your contribution,
-this simply gives us permission to use and redistribute your contributions as
-part of the project. Head over to <https://cla.developers.google.com/> to see
-your current agreements on file or to sign a new one.
-
-You generally only need to submit a CLA once, so if you've already submitted one
-(even if it was for a different project), you probably don't need to do it
-again.
-
-## Code reviews
-
-All submissions, including submissions by project members, require review. We
-use GitHub pull requests for this purpose. Consult
-[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
-information on using pull requests.
+Due to lack of bandwidth, we are not accepting pull requests at this time. If
+there is something you feel should be added to the repo, please raise an issue
+or feature request.
diff --git a/README.md b/README.md
@@ -30,6 +30,7 @@ Dopamine supports the following agents, implemented with jax:
 * Rainbow ([Hessel et al., 2018][rainbow])
 * IQN ([Dabney et al., 2018][iqn])
 * SAC ([Haarnoja et al., 2018][sac])
+* PPO ([Schulman et al., 2017][ppo])
 
 For more information on the available agents, see the [docs](https://google.github.io/dopamine/docs).
 
@@ -140,6 +141,8 @@ Conference on Learning Representations, 2016.][prioritized_replay]
 [Haarnoja et al., *Soft Actor-Critic Algorithms and Applications*,
 arXiv preprint arXiv:1812.05905, 2018.][sac]
 
+[Schulman et al., *Proximal Policy Optimization Algorithms*.][ppo]
+
 ## Giving credit
 
 If you use Dopamine in your work, we ask that you cite our
@@ -160,7 +163,6 @@ If you use Dopamine in your work, we ask that you cite our
 ```
 
 
-
 [docs]: https://google.github.io/dopamine/docs/
 [baselines]: https://google.github.io/dopamine/baselines
 [machado]: https://jair.org/index.php/jair/article/view/11182
@@ -172,5 +174,6 @@ If you use Dopamine in your work, we ask that you cite our
 [rainbow]: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/17204/16680
 [iqn]: https://arxiv.org/abs/1806.06923
 [sac]: https://arxiv.org/abs/1812.05905
+[ppo]: https://arxiv.org/abs/1707.06347
 [dopamine_paper]: https://arxiv.org/abs/1812.06110
 [vitualenv]: https://docs.python.org/3/library/venv.html#creating-virtual-environments
diff --git a/baselines/atari/README.md b/baselines/atari/README.md
@@ -1,18 +1,40 @@
 # Baseline data
 
 This directory provides information about the baseline data provided by
-Dopamine. The default hyperparameter configuration for the 4 agents we are
+Dopamine. The default hyperparameter configuration for the agents we are
 providing yields a standardized "apples to apples" comparison between them.
 
-The default configuration files files for each agent (set up with
-[gin configuration framework](https://github.com/google/gin-config)) are:
+The default configuration files files for these agents (set up with [gin
+configuration framework](https://github.com/google/gin-config)) are:
 
-*   [`dopamine/agents/dqn/configs/dqn.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/dqn/configs/dqn.gin)
-*   [`dopamine/agents/rainbow/configs/c51.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/rainbow/configs/c51.gin)
-*   [`dopamine/agents/rainbow/configs/rainbow.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/rainbow/configs/rainbow.gin)
-*   [`dopamine/agents/implicit_quantile/configs/implicit_quantile.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/configs/implicit_quantile.gin)
+*   [`dopamine/jax/agents/dqn/configs/dqn.gin`](https://github.com/google/dopamine/blob/master/dopamine/jax/agents/dqn/configs/dqn.gin)
+*   [`dopamine/jax/agents/implicit_quantile/configs/implicit_quantile.gin`](https://github.com/google/dopamine/blob/master/dopamine/jax/agents/implicit_quantile/configs/implicit_quantile.gin)
+*   [`dopamine/jax/agents/quantile/configs/quantile.gin`](https://github.com/google/dopamine/blob/master/dopamine/jax/agents/quantile/configs/quantile.gin)
+*   [`dopamine/jax/agents/rainbow/configs/rainbow.gin`](https://github.com/google/dopamine/blob/master/dopamine/jax/agents/rainbow/configs/rainbow.gin)
 
-## Hyperparemeter comparison
+## Visualization
+We provide a [website](https://google.github.io/dopamine/baselines/atari/plots.html)
+where you can quickly visualize the training runs for all our default agents.
+
+The plots are rendered from a set of
+[JSON files](https://github.com/google/dopamine/tree/master/baselines/atari/data)
+which we compiled. These may prove useful in their own right to compare
+against results obtained from other frameworks.
+
+## Legacy TensorFlow models
+
+Dopamine agents originally used [TensorFlow](https://www.tensorflow.org/) for
+its networks and agents, but has since migrated to
+[Jax](https://jax.readthedocs.io/en/latest/). The default configuration files
+files for the legacy TF agents (set up with [gin configuration
+framework](https://github.com/google/gin-config)) are:
+
+*   [`dopamine/tf/agents/dqn/configs/dqn.gin`](https://github.com/google/dopamine/blob/master/dopamine/tf/agents/dqn/configs/dqn.gin)
+*   [`dopamine/tf/agents/rainbow/configs/c51.gin`](https://github.com/google/dopamine/blob/master/dopamine/tf/agents/rainbow/configs/c51.gin)
+*   [`dopamine/tf/agents/rainbow/configs/rainbow.gin`](https://github.com/google/dopamine/blob/master/dopamine/tf/agents/rainbow/configs/rainbow.gin)
+*   [`dopamine/tf/agents/implicit_quantile/configs/implicit_quantile.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/tf/implicit_quantile/configs/implicit_quantile.gin)
+
+### Hyperparemeter comparison
 Our results compare the agents with the same hyperparameters: target
 network update frequency, frequency at which exploratory actions are selected (ε), the
 length of the schedule over which ε is annealed, and the number of agent steps
@@ -23,6 +45,9 @@ actions instead of 10% (as used in the original Nature paper). Step size and
 optimizer were taken as published. The table below summarizes our choices. All
 numbers are in ALE frames.
 
+Note that these numbers were obtained with the legacy TensorFlow
+implementations.
+
 |                                     | Our baseline results | [DQN][dqn]       | [C51][c51]       | [Rainbow][rainbow] | [IQN][iqn]       |
 | :---------------------------------- | :------------------: | :--------:       | :--------:       | :----------------: | :--------:       |
 | **Training ε**                      | 0.01                 | 0.1              | 0.01             | 0.01               | 0.01             |
@@ -31,17 +56,9 @@ numbers are in ALE frames.
 | **Min. history to start learning**  | 80,000 frames        | 200,000 frames   | 200,000 frames   | 80,000 frames      | 200,000 frames   |
 | **Target network update frequency** | 32,000 frames        | 40,000 frames    | 40,000 frames    | 32,000 frames      | 40,000 frames    |
 
-## Visualization
-We provide a [website](https://google.github.io/dopamine/baselines/atari/plots.html)
-where you can quickly visualize the training runs for all our default agents.
-
-The plots are rendered from a set of
-[JSON files](https://github.com/google/dopamine/tree/master/baselines/atari/data)
-which we compiled. These may prove useful in their own right to compare
-against results obtained from other frameworks.
-
 
 [dqn]: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
 [c51]: https://arxiv.org/abs/1707.06887
 [rainbow]: https://arxiv.org/abs/1710.02298
+[qr-dqn]: https://arxiv.org/abs/1710.10044
 [iqn]: https://arxiv.org/abs/1806.06923