Skip to content

Commit

Permalink
docs for qrdqn
Browse files Browse the repository at this point in the history
  • Loading branch information
wenzhangliu committed Dec 29, 2024
1 parent 2572315 commit 7094d05
Show file tree
Hide file tree
Showing 4 changed files with 190 additions and 20 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions docs/source/documents/api/agents/drl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ DRL Agent
DQN <drl/dqn_agent>
Double DQN <drl/ddqn_agent>
Dueling DQN <drl/dueldqn_agent>
NoisyDQN <drl/noisydqn_agent>
PerDQN <drl/perdqn_agent>
QRDQN <drl/qrdqn_agent>
Noisy DQN <drl/noisydqn_agent>
PER DQN <drl/perdqn_agent>
QR-DQN <drl/qrdqn_agent>
C51 <drl/c51_agent>
DRQN <drl/drqn_agent>
PG <drl/pg_agent>
Expand All @@ -29,9 +29,9 @@ DRL Agent
* :doc:`Deep Q-Network (DQN) <drl/dqn_agent>`.
* :doc:`Double Deep Q-Network (Double DQN) <drl/ddqn_agent>`.
* :doc:`Dueling Deep Q-Network (Dueling DQN) <drl/dueldqn_agent>`.
* :doc:`DQN with Noisy Layers (NoisyDQN) <drl/noisydqn_agent>`.
* :doc:`DQN with Prioritized Experience Replay (PerDQN) <drl/perdqn_agent>`.
* :doc:`DQN with Quantile Regression (QRDQN) <drl/qrdqn_agent>`.
* :doc:`DQN with Noisy Layers (Noisy DQN) <drl/noisydqn_agent>`.
* :doc:`DQN with Prioritized Experience Replay (PER DQN) <drl/perdqn_agent>`.
* :doc:`DQN with Quantile Regression (QR-DQN) <drl/qrdqn_agent>`.
* :doc:`Categorical 51 DQN (C51) <drl/c51_agent>`.
* :doc:`Deep Recurrent Q-Network (DRQN) <drl/drqn_agent>`.
* :doc:`Policy Gradient (PG) <drl/pg_agent>`.
Expand Down
18 changes: 9 additions & 9 deletions docs/source/documents/api/agents/drl/perdqn_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ by prioritizing certain experiences during training.

This table lists some general features about PER DQN algorithm:

| Features of Noisy DQN | Values | Description |
|-----------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |
| Features of PER DQN | Values | Description |
|---------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## Method

Expand Down Expand Up @@ -71,7 +71,7 @@ The full algorithm for training PER DQN is presented in Algorithm 1:
:align: center
```

## Run DQN in XuanCe
## Run PER DQN in XuanCe

Before running PER DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).
Expand Down
180 changes: 175 additions & 5 deletions docs/source/documents/api/agents/drl/qrdqn_agent.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,194 @@
# DQN with Quantile Regression (QRDQN)
# Quantile Regression Deep Q-Network (QR-DQN)

**Paper Link:** [**https://ojs.aaai.org/index.php/AAAI/article/view/11791**](https://ojs.aaai.org/index.php/AAAI/article/view/11791).

Quantile Regression Deep Q-Network (QR-DQN) is an extension of the traditional DQN
designed to improve the handling of uncertainty and variance in reinforcement learning,
especially in environments where the rewards can be highly variable or noisy.
QR-DQN combines elements of quantile regression with DQN,
allowing it to learn a distribution over Q-values rather than just a single point estimate.
This helps improve the stability and robustness of the learning process.

This table lists some general features about QR-DQN algorithm:

| Features of QR-DQN | Values | Description |
|--------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## Method

### Distributional Reinforcement Learning

Traditional Q-learning estimates the expected return (mean) for each state-action pair.
However, in many cases, the returns can be uncertain or variable,
and just focusing on the mean may not capture the full picture of this uncertainty.

Distributional reinforcement learning seeks to model the distribution of possible returns for each state-action pair,
not just the expected value.

### Quantile Regression

Quantile regression is a technique that estimates specific quantiles
(e.g., the 50th percentile, 90th percentile) of a distribution, rather than the mean.
This allows the model to capture the entire distribution of the possible returns,
providing richer information about the variability in future rewards.

In QR-DQN, instead of learning a single Q-value,
the agent learns multiple quantiles of the distribution over the Q-values.

### Architecture of QR-DQN

In QR-DQN, the Q-value function is represented by a distribution over possible returns.
Specifically, the agent approximates the quantile function of the return distribution using a set of quantile values.

The quantiles $\tau_i$ (where $\tau_i \in [0, 1]$) correspond to different points in the return distribution
(e.g., the 10th, 50th, and 90th percentiles).
The algorithm learns a quantile regression loss to estimate the quantiles of the Q-value distribution,
rather than learning a single expected Q-value.

### Loss function

QR-DQN uses the quantile Huber loss,
which is a combination of the Huber loss function (which is less sensitive to outliers) and the quantile loss.
The quantile loss penalizes the model based on how well it predicts the desired quantiles of the Q-value distribution.

The quantile loss for a given quantile $\tau$ is defined as:

$$
L_{\tau}(Q, \hat{Q}) = \rho_{\tau}(r - Q),
$$

where $r$ is the target return (the actual reward or the next state's predicted value),
$Q$ is the predicted quantile value for a given state-action pair,
$\hat{Q}$ is the corresponding target quantile (from Bellman backup),
and $\rho_{\tau}(z)$ is the check function defined as:

$$
\rho_{\tau}(z) = z(\tau - \mathbb{I}[z<0]),
$$

where $\mathbb{I}[z<0]$ is the indicator function that equals 1 when $z < 0$ and 0 otherwise.

The quantile regression loss encourages the model to learn quantile values
that minimize the discrepancy between the predicted quantiles and the true return distributions.

## Algorithm

The full algorithm for training QR-DQN is presented in Algorithm 1:

```{eval-rst}
.. image:: ./../../../../_static/figures/pseucodes/pseucode-QRDQN.png
:width: 70%
:align: center
```

## Run QR-DQN in XuanCe

Before running QR-DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).

### Run Build-in Demos

After completing the installation, you can open a Python console and run QR-DQN directly using the following commands:

```python3
import xuance
runner = xuance.get_runner(method='qrdqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
```

### Run With Self-defined Configs

If you want to run QR-DQN with different configurations, you can build a new ``.yaml`` file, e.g., ``my_config.yaml``.
Then, run the QR-DQN by the following code block:

```python3
import xuance as xp
runner = xp.get_runner(method='qrdqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](./../../configs/configuration_examples.rst).

### Run With Customized Environment

If you would like to run XuanCe's QR-DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](./../../../usage/new_envs.rst).
Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations)
``qrdqn_myenv.yaml``.

After that, you can run QR-DQN in your own environment with the following code:

```python3
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import QRDQN_Agent

configs_dict = get_configs(file_dir="qrdqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv

envs = make_envs(configs) # Make parallel environments.
Agent = QRDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
```

## Citations

```{code-block} bash
@inproceedings{dabney2018distributional,
title={Distributional reinforcement learning with quantile regression},
author={Dabney, Will and Rowland, Mark and Bellemare, Marc and Munos, R{\'e}mi},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
volume={32},
number={1},
year={2018}
}
```

## APIs

### PyTorch

```{eval-rst}
.. automodule:: xuance.torch.agents.qlearning_family.qrdqn_agent
:members:
:undoc-members:
:show-inheritance:
```

TensorFlow2
''''''''''''
### TensorFlow2

```{eval-rst}
.. automodule:: xuance.tensorflow.agents.qlearning_family.qrdqn_agent
:members:
:undoc-members:
:show-inheritance:
```

MindSpore
''''''''''''
### MindSpore

```{eval-rst}
.. automodule:: xuance.mindspore.agents.qlearning_family.qrdqn_agent
:members:
:undoc-members:
:show-inheritance:
```

0 comments on commit 7094d05

Please sign in to comment.