Skip to content

Commit

Permalink
docs for noisy dqn
Browse files Browse the repository at this point in the history
  • Loading branch information
wenzhangliu committed Dec 29, 2024
1 parent ebb389f commit 20365c8
Show file tree
Hide file tree
Showing 5 changed files with 220 additions and 77 deletions.
38 changes: 20 additions & 18 deletions docs/source/documents/api/agents/drl/ddqn_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,20 @@ This can lead to suboptimal policies and unstable training.

This table lists some key features about Double DQN algorithm:

| Features of Double DQN | Results | Description |
|------------------------|---------|----------------------------------------------------------|
| On-policy | | The evaluate policy is the same as the target policy. |
| Off-policy | | The evaluate policy is different from the target policy. |
| Model-free | | No need to prepare an environment dynamics model. |
| Model-based | | Need an environment model to train the policy. |
| Discrete Action | | Deal with discrete action space. |
| Continuous Action | | Deal with continuous action space. |
| Features of Double DQN | Values | Description |
|------------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## The Risk of Overestimating

In standard DQN, overestimation occurs due to the use of a single Q-network for both selecting and evaluating actions.

As introduced before, [**DQN**](dqn_agent.md) updates the Q-value for a state-action pair $Q(s, a)$
As introduced before, [**DQN**](dqn_agent.md#deep-q-netowrk) updates the Q-value for a state-action pair $Q(s, a)$
by using the maximum of Q-value of the next state $\max_{a'}Q(s', a')$ as part of the target.

If the Q-network overestimates one or more state-action values, the overestimation propagates and accumulates over time.
Expand Down Expand Up @@ -62,10 +62,7 @@ $$

Finally, don't forget to update the target networks: $\theta^{-} \leftarrow \theta$.

## Run Double DQN in XuanCe

Before running Double DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html).
## Framework

The overall agent-environment interaction of Double DQN, as implemented in XuanCe, is illustrated in the figure below.

Expand All @@ -75,9 +72,14 @@ The overall agent-environment interaction of Double DQN, as implemented in XuanC
:align: center
```

## Run Double DQN in XuanCe

Before running Double DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).

### Run Build-in Demos

After completing the installation, you can open a Python console and run DQN directly using the following commands:
After completing the installation, you can open a Python console and run Double DQN directly using the following commands:

```python3
import xuance
Expand All @@ -104,15 +106,15 @@ runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html).
[**tutorial of configs**](./../../configs/configuration_examples.rst).

### Run With Customized Environment

If you would like to run XuanCe's Double DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment).
Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations)
``dqn_myenv.yaml``.
[**New Environment Tutorial**](./../../../usage/new_envs.rst).
Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations)
``ddqn_myenv.yaml``.

After that, you can run Double DQN in your own environment with the following code:

Expand Down
34 changes: 18 additions & 16 deletions docs/source/documents/api/agents/drl/dqn_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ achieving superhuman performance in many cases.

This table lists some key features about DQN algorithm:

| Features of DQN | Results | Description |
|-------------------|---------|----------------------------------------------------------|
| On-policy | | The evaluate policy is the same as the target policy. |
| Off-policy | | The evaluate policy is different from the target policy. |
| Model-free | | No need to prepare an environment dynamics model. |
| Model-based | | Need an environment model to train the policy. |
| Discrete Action | | Deal with discrete action space. |
| Continuous Action | | Deal with continuous action space. |
| Features of DQN | Values | Description |
|-------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## Q-Learning

Expand Down Expand Up @@ -81,10 +81,7 @@ The full algorithm for training DQN is presented in Algorithm 1:
:align: center
```

## Run DQN in XuanCe

Before running DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html).
## Framework

The overall agent-environment interaction of DQN, as implemented in XuanCe, is illustrated in the figure below.

Expand All @@ -94,6 +91,11 @@ The overall agent-environment interaction of DQN, as implemented in XuanCe, is i
:align: center
```

## Run DQN in XuanCe

Before running DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).

### Run Build-in Demos

After completing the installation, you can open a Python console and run DQN directly using the following commands:
Expand Down Expand Up @@ -123,15 +125,15 @@ runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html).
[**tutorial of configs**](./../../configs/configuration_examples.rst).

### Run With Customized Environment

If you would like to run XuanCe's DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment).
Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations)
``dqn_myenv.yaml``.
[**New Environment Tutorial**](./../../../usage/new_envs.rst).
Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations)
``dqn_myenv.yaml``.

After that, you can run DQN in your own environment with the following code:

Expand Down
36 changes: 19 additions & 17 deletions docs/source/documents/api/agents/drl/dueldqn_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@ and the action advantage function, addressing key limitations of traditional DQN

This table lists some key features about Dueling DQN algorithm:

| Features of Dueling DQN | Results | Description |
|-------------------------|---------|----------------------------------------------------------|
| On-policy | | The evaluate policy is the same as the target policy. |
| Off-policy | | The evaluate policy is different from the target policy. |
| Model-free | | No need to prepare an environment dynamics model. |
| Model-based | | Need an environment model to train the policy. |
| Discrete Action | | Deal with discrete action space. |
| Continuous Action | | Deal with continuous action space. |
| Features of Dueling DQN | Values | Description |
|-------------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## Key Idea of Dueling DQN
## Key Ideas of Dueling DQN

Let $V(s)$ represent the overall value of state $s$.
$A(s, a)$ is the advantage function that measures the relative benefit of taking a specific action $a$ given state $s$.
Expand All @@ -43,10 +43,7 @@ The architecture of Dueling DQN can be illustrated as the following figure:
:align: center
```

## Run Dueling DQN in XuanCe

Before running Dueling DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html).
## Framework

The overall agent-environment interaction of Dueling DQN, as implemented in XuanCe, is illustrated in the figure below.

Expand All @@ -56,9 +53,14 @@ The overall agent-environment interaction of Dueling DQN, as implemented in Xuan
:align: center
```

## Run Dueling DQN in XuanCe

Before running Dueling DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).

### Run Build-in Demos

After completing the installation, you can open a Python console and run DQN directly using the following commands:
After completing the installation, you can open a Python console and run Dueling DQN directly using the following commands:

```python3
import xuance
Expand All @@ -85,14 +87,14 @@ runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html).
[**tutorial of configs**](./../../configs/configuration_examples.rst).

### Run With Customized Environment

If you would like to run XuanCe's Dueling DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment).
Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations)
[**New Environment Tutorial**](./../../../usage/new_envs.rst).
Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations)
``duelqn_myenv.yaml``.

After that, you can run Dueling DQN in your own environment with the following code:
Expand Down
163 changes: 163 additions & 0 deletions docs/source/documents/api/agents/drl/noisydqn_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# DQN with Noisy Layers (Noisy DQN)

**Paper Link:** [**https://arxiv.org/pdf/1706.01905**](https://arxiv.org/pdf/1706.01905).

Noisy DQN is a variant of the traditional Deep Q-Network (DQN)
that introduces noise into the weights of the Q-network to improve exploration during the learning process.
This is aimed at addressing one of the key challenges in reinforcement learning: balancing exploration and exploitation.

This table lists some key features about Noisy DQN algorithm:

| Features of Noisy DQN | Values | Description |
|-----------------------|--------|----------------------------------------------------------|
| On-policy || The evaluate policy is the same as the target policy. |
| Off-policy || The evaluate policy is different from the target policy. |
| Model-free || No need to prepare an environment dynamics model. |
| Model-based || Need an environment model to train the policy. |
| Discrete Action || Deal with discrete action space. |
| Continuous Action || Deal with continuous action space. |

## Key Ideas of Noisy DQN

**Exploration vs. Exploitation**: In standard DQN, exploration is often controlled by an $\epsilon$-greedy policy,
where the agent randomly selects actions with a certain probability (epsilon),
and exploits the best-known action the rest of the time. Noisy DQN attempts to address the challenge of exploration by introducing noise directly into the network's parameters,
rather than relying solely on random action selection.

**Noisy Networks**: Instead of using a fixed epsilon for exploration, Noisy DQN introduces noise into the parameters of the Q-network itself.
This is done by adding parameter noise to the Q-network’s weights, which modifies the output Q-values,
encouraging exploration of different actions and states.

**Noisy Linear Layers**: In the Noisy DQN architecture, the traditional fully connected layers of the neural network are replaced with "noisy" layers.
These noisy layers add noise to the weights of the layers during training, making the agent’s decision-making process inherently more exploratory.

**The Noisy Network Formula**: For each layer in the network, the weights are parameterized as:

$$
w = \mu + \sigma \cdot \epsilon,
$$

where:
- $\mu$ is the mean or the base weight;
- $\sigma$ is the standard deviation that controls the level of noise;
- $\epsilon$ is a sample from a noise distribution (usually Gaussian).
The noise $\epsilon$ is sampled at the beginning of each episode or iteration, ensuring the noise is dynamic during training.

The Noisy DQN has the three main benefits:

- **Improved Exploration**: By introducing noise in the Q-values, the agent is encouraged to explore a broader range of actions, rather than exploiting the current best-known action.
- **Adaptive Exploration**: The level of exploration can be adjusted automatically as part of the training, eliminating the need to manually tune exploration parameters like epsilon.
- **Efficient Training**: Noisy DQN can improve sample efficiency because it uses the exploration to visit less frequently encountered states, potentially leading to better performance in complex environments.

## Framework

Noisy DQN retains the same overall structure as
[**DQN**](dqn_agent.md#framework)
(i.e., experience replay, target networks, etc.),
but replaces the exploration mechanism with the noisy layers in the Q-network.

## Run Noisy DQN in XuanCe

Before running Noisy DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](./../../../usage/installation.rst#install-via-pypi).

### Run Build-in Demos

After completing the installation, you can open a Python console and run Noisy DQN directly using the following commands:

```python3
import xuance
runner = xuance.get_runner(method='noisydqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
```

### Run With Self-defined Configs

If you want to run Noisy DQN with different configurations, you can build a new ``.yaml`` file, e.g., ``my_config.yaml``.
Then, run the Noisy DQN by the following code block:

```python3
import xuance as xp
runner = xp.get_runner(method='noisydqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](./../../configs/configuration_examples.rst).

### Run With Customized Environment

If you would like to run XuanCe's Noisy DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](./../../../usage/new_envs.rst).
Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations)
``noisydqn_myenv.yaml``.

After that, you can run Noisy DQN in your own environment with the following code:

```python3
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import NoisyDQN_Agent

configs_dict = get_configs(file_dir="noisydqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv

envs = make_envs(configs) # Make parallel environments.
Agent = NoisyDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
```

## Citations

```{code-block} bash
@inproceedings{
plappert2018parameter,
title={Parameter Space Noise for Exploration},
author={Matthias Plappert and Rein Houthooft and Prafulla Dhariwal and Szymon Sidor and Richard Y. Chen and Xi Chen and Tamim Asfour and Pieter Abbeel and Marcin Andrychowicz},
booktitle={International Conference on Learning Representations},
year={2018},
url={https://openreview.net/forum?id=ByBAl2eAZ},
}
```

## APIs

### PyTorch

```{eval-rst}
.. automodule:: xuance.torch.agents.qlearning_family.noisydqn_agent
:members:
:undoc-members:
:show-inheritance:
```

### TensorFlow2

```{eval-rst}
.. automodule:: xuance.tensorflow.agents.qlearning_family.noisydqn_agent
:members:
:undoc-members:
:show-inheritance:
```

### MindSpore

```{eval-rst}
.. automodule:: xuance.mindspore.agents.qlearning_family.noisydqn_agent
:members:
:undoc-members:
:show-inheritance:
```
Loading

0 comments on commit 20365c8

Please sign in to comment.