XuanCe is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.
We call it as Xuan-Ce (ηη) in Chinese. "Xuan (η)" means incredible and magic box, "Ce (η)" means policy.
DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.
We expect it to be compatible with multiple deep learning toolboxes( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.
Paper link: https://arxiv.org/pdf/2312.16248.pdf
π Full Documentation | δΈζζζ‘£ π
- π Highly modularized.
- π Easy to learn, easy for installation, and easy for usage.
- π Flexible for model combination.
- π Abundant algorithms with various tasks.
- π« Supports both DRL and MARL tasks.
- π High compatibility for different users. (PyTorch, TensorFlow2, MindSpore, CPU, GPU, Linux, Windows, MacOS, etc.)
- β‘ Fast running speed with parallel environments.
- π» Distributed training with multi-GPUs.
- π Good visualization effect with tensorboard or wandb tool.
(Click to show supported DRL algorithms)
- Deep Q Network - DQN [Paper]
- DQN with Double Q-learning - Double DQN [Paper]
- DQN with Dueling Network - Dueling DQN [Paper]
- DQN with Prioritized Experience Replay - PER [Paper]
- DQN with Parameter Space Noise for Exploration - NoisyNet [Paper]
- Deep Recurrent Q-Netwrk - DRQN [Paper]
- DQN with Quantile Regression - QRDQN [Paper]
- Distributional Reinforcement Learning - C51 [Paper]
- Vanilla Policy Gradient - PG [Paper]
- Phasic Policy Gradient - PPG [Paper] [Code]
- Advantage Actor Critic - A2C [Paper] [Code]
- Soft Actor-Critic - SAC [Paper] [Code]
- Soft Actor-Critic for Discrete Actions - SAC-Discrete [Paper] [Code]
- Proximal Policy Optimization with Clipped Objective - PPO-Clip [Paper] [Code]
- Proximal Policy Optimization with KL Divergence - PPO-KL [Paper] [Code]
- Deep Deterministic Policy Gradient - DDPG [Paper] [Code]
- Twin Delayed Deep Deterministic Policy Gradient - TD3 [Paper][Code]
- Parameterised Deep Q-Network - P-DQN [Paper]
- Multi-pass Parameterised Deep Q-network - MP-DQN [Paper] [Code]
- Split Parameterised Deep Q-Network - SP-DQN [Paper]
(Click to show supported MARL algorithms)
- Independent Q-learning - IQL [Paper] [Code]
- Value Decomposition Networks - VDN [Paper] [Code]
- Q-mixing networks - QMIX [Paper] [Code]
- Weighted Q-mixing networks - WQMIX [Paper] [Code]
- Q-transformation - QTRAN [Paper] [Code]
- Deep Coordination Graphs - DCG [Paper] [Code]
- Independent Deep Deterministic Policy Gradient - IDDPG [Paper]
- Multi-agent Deep Deterministic Policy Gradient - MADDPG [Paper] [Code]
- Independent Actor-Critic - IAC [Paper] [Code]
- Counterfactual Multi-agent Policy Gradient - COMA [Paper] [Code]
- Value-Decomposition Actor-Critic - VDAC [Paper] [Code]
- Independent Proximal Policy Optimization - IPPO [Paper] [Code]
- Multi-agent Proximal Policy Optimization - MAPPO [Paper] [Code]
- Mean-Field Q-learning - MFQ [Paper] [Code]
- Mean-Field Actor-Critic - MFAC [Paper] [Code]
- Independent Soft Actor-Critic - ISAC
- Multi-agent Soft Actor-Critic - MASAC [Paper]
- Multi-agent Twin Delayed Deep Deterministic Policy Gradient - MATD3 [Paper]
XuanCe's documentation for the installation and usage of gym-pybullet-drones.
π» The library can be run at Linux, Windows, MacOS, and EulerOS, etc.
Before installing XuanCe, you should install Anaconda to prepare a python environment. (Note: select a proper version of Anaconda from here.)
After that, open a terminal and install XuanCe by the following steps.
Step 1: Create a new conda environment (python>=3.7 is suggested):
conda create -n xuance_env python=3.7
Step 2: Activate conda environment:
conda activate xuance_env
Step 3: Install the library:
pip install xuance
This command does not include the dependencies of deep learning toolboxes. To install the XuanCe with
deep learning tools, you can type pip install xuance[torch]
for PyTorch,
pip install xuance[tensorflow]
for TensorFlow2,
pip install xuance[mindspore]
for MindSpore,
and pip install xuance[all]
for all dependencies.
Note: Some extra packages should be installed manually for further usage.
import xuance
runner = xuance.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=False)
runner.run()
import xuance
runner_test = xuance.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=True)
runner_test.run()
You can use tensorboard to visualize what happened in the training process. After training, the log file will be automatically generated in the directory ".results/" and you should be able to see some training data after running the command.
$ tensorboard --logdir ./logs/dqn/torch/CartPole-v0
XuanCe also supports Weights & Biases (wandb) tools for users to visualize the results of the running implementation.
How to use wandb online? β‘οΈ https://github.com/wandb/wandb.git/
How to use wandb offline? β‘οΈ https://github.com/wandb/server.git/
- GitHub issues: https://github.com/agi-brain/xuance/issues
- Discord invite link: https://discord.gg/HJn2TBQS7y
- Slack invite link: https://join.slack.com/t/xuancerllib/
- QQ App's group number: 552432695
- WeChat account: "ηη RLlib"
(Note: You can also post your questions on Stack Overflow.)
If you use XuanCe in your research or development, please cite the paper:
@article{liu2023xuance,
title={XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library},
author={Liu, Wenzhang and Cai, Wenzhe and Jiang, Kun and Cheng, Guangran and Wang, Yuanda and Wang, Jiawei and Cao, Jingyu and Xu, Lele and Mu, Chaoxu and Sun, Changyin},
journal={arXiv preprint arXiv:2312.16248},
year={2023}
}