My implementations of some RL algorithms.
Papers to some of the algorithms:
- DQN: https://www.nature.com/articles/nature14236
- Double DQN: https://arxiv.org/abs/1509.06461
- Dueling Network Architectures: https://arxiv.org/abs/1511.06581
- Prioritized Experience Replay: https://arxiv.org/abs/1511.05952
- Hindsight Experience Replay: https://arxiv.org/abs/1707.01495
- DDPG: https://arxiv.org/abs/1509.02971
- MADDPG: https://arxiv.org/abs/1706.02275
- TD3: https://arxiv.org/abs/1802.09477
- SAC: https://arxiv.org/abs/1801.01290
- PPO: https://arxiv.org/abs/1707.06347
- RND: https://arxiv.org/abs/1810.12894
- Ape-X: https://arxiv.org/abs/1803.00933 (not complete yet, wip)