From 20365c8c12295021e598d6401cb82b7f1762a8a7 Mon Sep 17 00:00:00 2001 From: wenzhangliu Date: Sun, 29 Dec 2024 22:15:20 +0800 Subject: [PATCH] docs for noisy dqn --- .../documents/api/agents/drl/ddqn_agent.md | 38 ++-- .../documents/api/agents/drl/dqn_agent.md | 34 ++-- .../documents/api/agents/drl/dueldqn_agent.md | 36 ++-- .../api/agents/drl/noisydqn_agent.md | 163 ++++++++++++++++++ .../api/agents/drl/noisydqn_agent.rst | 26 --- 5 files changed, 220 insertions(+), 77 deletions(-) create mode 100644 docs/source/documents/api/agents/drl/noisydqn_agent.md delete mode 100644 docs/source/documents/api/agents/drl/noisydqn_agent.rst diff --git a/docs/source/documents/api/agents/drl/ddqn_agent.md b/docs/source/documents/api/agents/drl/ddqn_agent.md index d9f592fc4..0ef84ceb8 100644 --- a/docs/source/documents/api/agents/drl/ddqn_agent.md +++ b/docs/source/documents/api/agents/drl/ddqn_agent.md @@ -13,20 +13,20 @@ This can lead to suboptimal policies and unstable training. This table lists some key features about Double DQN algorithm: -| Features of Double DQN | Results | Description | -|------------------------|---------|----------------------------------------------------------| -| On-policy | ❌ | The evaluate policy is the same as the target policy. | -| Off-policy | ✅ | The evaluate policy is different from the target policy. | -| Model-free | ✅ | No need to prepare an environment dynamics model. | -| Model-based | ❌ | Need an environment model to train the policy. | -| Discrete Action | ✅ | Deal with discrete action space. | -| Continuous Action | ❌ | Deal with continuous action space. | +| Features of Double DQN | Values | Description | +|------------------------|--------|----------------------------------------------------------| +| On-policy | ❌ | The evaluate policy is the same as the target policy. | +| Off-policy | ✅ | The evaluate policy is different from the target policy. | +| Model-free | ✅ | No need to prepare an environment dynamics model. | +| Model-based | ❌ | Need an environment model to train the policy. | +| Discrete Action | ✅ | Deal with discrete action space. | +| Continuous Action | ❌ | Deal with continuous action space. | ## The Risk of Overestimating In standard DQN, overestimation occurs due to the use of a single Q-network for both selecting and evaluating actions. -As introduced before, [**DQN**](dqn_agent.md) updates the Q-value for a state-action pair $Q(s, a)$ +As introduced before, [**DQN**](dqn_agent.md#deep-q-netowrk) updates the Q-value for a state-action pair $Q(s, a)$ by using the maximum of Q-value of the next state $\max_{a'}Q(s', a')$ as part of the target. If the Q-network overestimates one or more state-action values, the overestimation propagates and accumulates over time. @@ -62,10 +62,7 @@ $$ Finally, don't forget to update the target networks: $\theta^{-} \leftarrow \theta$. -## Run Double DQN in XuanCe - -Before running Double DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following -the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html). +## Framework The overall agent-environment interaction of Double DQN, as implemented in XuanCe, is illustrated in the figure below. @@ -75,9 +72,14 @@ The overall agent-environment interaction of Double DQN, as implemented in XuanC :align: center ``` +## Run Double DQN in XuanCe + +Before running Double DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following +the [**installation steps**](./../../../usage/installation.rst#install-via-pypi). + ### Run Build-in Demos -After completing the installation, you can open a Python console and run DQN directly using the following commands: +After completing the installation, you can open a Python console and run Double DQN directly using the following commands: ```python3 import xuance @@ -104,15 +106,15 @@ runner.run() # Or runner.benchmark() ``` To learn more about the configurations, please visit the -[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html). +[**tutorial of configs**](./../../configs/configuration_examples.rst). ### Run With Customized Environment If you would like to run XuanCe's Double DQN in your own environment that was not included in XuanCe, you need to define the new environment following the steps in -[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment). -Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations) -``dqn_myenv.yaml``. +[**New Environment Tutorial**](./../../../usage/new_envs.rst). +Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations) +``ddqn_myenv.yaml``. After that, you can run Double DQN in your own environment with the following code: diff --git a/docs/source/documents/api/agents/drl/dqn_agent.md b/docs/source/documents/api/agents/drl/dqn_agent.md index 7f1195bf1..4dc2973c6 100644 --- a/docs/source/documents/api/agents/drl/dqn_agent.md +++ b/docs/source/documents/api/agents/drl/dqn_agent.md @@ -10,14 +10,14 @@ achieving superhuman performance in many cases. This table lists some key features about DQN algorithm: -| Features of DQN | Results | Description | -|-------------------|---------|----------------------------------------------------------| -| On-policy | ❌ | The evaluate policy is the same as the target policy. | -| Off-policy | ✅ | The evaluate policy is different from the target policy. | -| Model-free | ✅ | No need to prepare an environment dynamics model. | -| Model-based | ❌ | Need an environment model to train the policy. | -| Discrete Action | ✅ | Deal with discrete action space. | -| Continuous Action | ❌ | Deal with continuous action space. | +| Features of DQN | Values | Description | +|-------------------|--------|----------------------------------------------------------| +| On-policy | ❌ | The evaluate policy is the same as the target policy. | +| Off-policy | ✅ | The evaluate policy is different from the target policy. | +| Model-free | ✅ | No need to prepare an environment dynamics model. | +| Model-based | ❌ | Need an environment model to train the policy. | +| Discrete Action | ✅ | Deal with discrete action space. | +| Continuous Action | ❌ | Deal with continuous action space. | ## Q-Learning @@ -81,10 +81,7 @@ The full algorithm for training DQN is presented in Algorithm 1: :align: center ``` -## Run DQN in XuanCe - -Before running DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following -the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html). +## Framework The overall agent-environment interaction of DQN, as implemented in XuanCe, is illustrated in the figure below. @@ -94,6 +91,11 @@ The overall agent-environment interaction of DQN, as implemented in XuanCe, is i :align: center ``` +## Run DQN in XuanCe + +Before running DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following +the [**installation steps**](./../../../usage/installation.rst#install-via-pypi). + ### Run Build-in Demos After completing the installation, you can open a Python console and run DQN directly using the following commands: @@ -123,15 +125,15 @@ runner.run() # Or runner.benchmark() ``` To learn more about the configurations, please visit the -[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html). +[**tutorial of configs**](./../../configs/configuration_examples.rst). ### Run With Customized Environment If you would like to run XuanCe's DQN in your own environment that was not included in XuanCe, you need to define the new environment following the steps in -[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment). -Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations) -``dqn_myenv.yaml``. +[**New Environment Tutorial**](./../../../usage/new_envs.rst). +Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations) + ``dqn_myenv.yaml``. After that, you can run DQN in your own environment with the following code: diff --git a/docs/source/documents/api/agents/drl/dueldqn_agent.md b/docs/source/documents/api/agents/drl/dueldqn_agent.md index 82c21bb50..5ce432c59 100644 --- a/docs/source/documents/api/agents/drl/dueldqn_agent.md +++ b/docs/source/documents/api/agents/drl/dueldqn_agent.md @@ -9,16 +9,16 @@ and the action advantage function, addressing key limitations of traditional DQN This table lists some key features about Dueling DQN algorithm: -| Features of Dueling DQN | Results | Description | -|-------------------------|---------|----------------------------------------------------------| -| On-policy | ❌ | The evaluate policy is the same as the target policy. | -| Off-policy | ✅ | The evaluate policy is different from the target policy. | -| Model-free | ✅ | No need to prepare an environment dynamics model. | -| Model-based | ❌ | Need an environment model to train the policy. | -| Discrete Action | ✅ | Deal with discrete action space. | -| Continuous Action | ❌ | Deal with continuous action space. | +| Features of Dueling DQN | Values | Description | +|-------------------------|--------|----------------------------------------------------------| +| On-policy | ❌ | The evaluate policy is the same as the target policy. | +| Off-policy | ✅ | The evaluate policy is different from the target policy. | +| Model-free | ✅ | No need to prepare an environment dynamics model. | +| Model-based | ❌ | Need an environment model to train the policy. | +| Discrete Action | ✅ | Deal with discrete action space. | +| Continuous Action | ❌ | Deal with continuous action space. | -## Key Idea of Dueling DQN +## Key Ideas of Dueling DQN Let $V(s)$ represent the overall value of state $s$. $A(s, a)$ is the advantage function that measures the relative benefit of taking a specific action $a$ given state $s$. @@ -43,10 +43,7 @@ The architecture of Dueling DQN can be illustrated as the following figure: :align: center ``` -## Run Dueling DQN in XuanCe - -Before running Dueling DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following -the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html). +## Framework The overall agent-environment interaction of Dueling DQN, as implemented in XuanCe, is illustrated in the figure below. @@ -56,9 +53,14 @@ The overall agent-environment interaction of Dueling DQN, as implemented in Xuan :align: center ``` +## Run Dueling DQN in XuanCe + +Before running Dueling DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following +the [**installation steps**](./../../../usage/installation.rst#install-via-pypi). + ### Run Build-in Demos -After completing the installation, you can open a Python console and run DQN directly using the following commands: +After completing the installation, you can open a Python console and run Dueling DQN directly using the following commands: ```python3 import xuance @@ -85,14 +87,14 @@ runner.run() # Or runner.benchmark() ``` To learn more about the configurations, please visit the -[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html). +[**tutorial of configs**](./../../configs/configuration_examples.rst). ### Run With Customized Environment If you would like to run XuanCe's Dueling DQN in your own environment that was not included in XuanCe, you need to define the new environment following the steps in -[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment). -Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations) +[**New Environment Tutorial**](./../../../usage/new_envs.rst). +Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations) ``duelqn_myenv.yaml``. After that, you can run Dueling DQN in your own environment with the following code: diff --git a/docs/source/documents/api/agents/drl/noisydqn_agent.md b/docs/source/documents/api/agents/drl/noisydqn_agent.md new file mode 100644 index 000000000..81aaf6077 --- /dev/null +++ b/docs/source/documents/api/agents/drl/noisydqn_agent.md @@ -0,0 +1,163 @@ +# DQN with Noisy Layers (Noisy DQN) + +**Paper Link:** [**https://arxiv.org/pdf/1706.01905**](https://arxiv.org/pdf/1706.01905). + +Noisy DQN is a variant of the traditional Deep Q-Network (DQN) +that introduces noise into the weights of the Q-network to improve exploration during the learning process. +This is aimed at addressing one of the key challenges in reinforcement learning: balancing exploration and exploitation. + +This table lists some key features about Noisy DQN algorithm: + +| Features of Noisy DQN | Values | Description | +|-----------------------|--------|----------------------------------------------------------| +| On-policy | ❌ | The evaluate policy is the same as the target policy. | +| Off-policy | ✅ | The evaluate policy is different from the target policy. | +| Model-free | ✅ | No need to prepare an environment dynamics model. | +| Model-based | ❌ | Need an environment model to train the policy. | +| Discrete Action | ✅ | Deal with discrete action space. | +| Continuous Action | ❌ | Deal with continuous action space. | + +## Key Ideas of Noisy DQN + +**Exploration vs. Exploitation**: In standard DQN, exploration is often controlled by an $\epsilon$-greedy policy, +where the agent randomly selects actions with a certain probability (epsilon), +and exploits the best-known action the rest of the time. Noisy DQN attempts to address the challenge of exploration by introducing noise directly into the network's parameters, +rather than relying solely on random action selection. + +**Noisy Networks**: Instead of using a fixed epsilon for exploration, Noisy DQN introduces noise into the parameters of the Q-network itself. +This is done by adding parameter noise to the Q-network’s weights, which modifies the output Q-values, +encouraging exploration of different actions and states. + +**Noisy Linear Layers**: In the Noisy DQN architecture, the traditional fully connected layers of the neural network are replaced with "noisy" layers. +These noisy layers add noise to the weights of the layers during training, making the agent’s decision-making process inherently more exploratory. + +**The Noisy Network Formula**: For each layer in the network, the weights are parameterized as: + +$$ +w = \mu + \sigma \cdot \epsilon, +$$ + +where: +- $\mu$ is the mean or the base weight; +- $\sigma$ is the standard deviation that controls the level of noise; +- $\epsilon$ is a sample from a noise distribution (usually Gaussian). +The noise $\epsilon$ is sampled at the beginning of each episode or iteration, ensuring the noise is dynamic during training. + +The Noisy DQN has the three main benefits: + +- **Improved Exploration**: By introducing noise in the Q-values, the agent is encouraged to explore a broader range of actions, rather than exploiting the current best-known action. +- **Adaptive Exploration**: The level of exploration can be adjusted automatically as part of the training, eliminating the need to manually tune exploration parameters like epsilon. +- **Efficient Training**: Noisy DQN can improve sample efficiency because it uses the exploration to visit less frequently encountered states, potentially leading to better performance in complex environments. + +## Framework + +Noisy DQN retains the same overall structure as +[**DQN**](dqn_agent.md#framework) +(i.e., experience replay, target networks, etc.), +but replaces the exploration mechanism with the noisy layers in the Q-network. + +## Run Noisy DQN in XuanCe + +Before running Noisy DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following +the [**installation steps**](./../../../usage/installation.rst#install-via-pypi). + +### Run Build-in Demos + +After completing the installation, you can open a Python console and run Noisy DQN directly using the following commands: + +```python3 +import xuance +runner = xuance.get_runner(method='noisydqn', + env='classic_control', # Choices: claasi_control, box2d, atari. + env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc. + is_test=False) +runner.run() # Or runner.benchmark() +``` + +### Run With Self-defined Configs + +If you want to run Noisy DQN with different configurations, you can build a new ``.yaml`` file, e.g., ``my_config.yaml``. +Then, run the Noisy DQN by the following code block: + +```python3 +import xuance as xp +runner = xp.get_runner(method='noisydqn', + env='classic_control', # Choices: claasi_control, box2d, atari. + env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc. + config_path="my_config.yaml", # The path of my_config.yaml file should be correct. + is_test=False) +runner.run() # Or runner.benchmark() +``` + +To learn more about the configurations, please visit the +[**tutorial of configs**](./../../configs/configuration_examples.rst). + +### Run With Customized Environment + +If you would like to run XuanCe's Noisy DQN in your own environment that was not included in XuanCe, +you need to define the new environment following the steps in +[**New Environment Tutorial**](./../../../usage/new_envs.rst). +Then, [**prepapre the configuration file**](./../../../usage/new_envs.rst#step-2-create-the-config-file-and-read-the-configurations) +``noisydqn_myenv.yaml``. + +After that, you can run Noisy DQN in your own environment with the following code: + +```python3 +import argparse +from xuance.common import get_configs +from xuance.environment import REGISTRY_ENV +from xuance.environment import make_envs +from xuance.torch.agents import NoisyDQN_Agent + +configs_dict = get_configs(file_dir="noisydqn_myenv.yaml") +configs = argparse.Namespace(**configs_dict) +REGISTRY_ENV[configs.env_name] = MyNewEnv + +envs = make_envs(configs) # Make parallel environments. +Agent = NoisyDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe. +Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps. +Agent.save_model("final_train_model.pth") # Save the model to model_dir. +Agent.finish() # Finish the training. +``` + +## Citations + +```{code-block} bash +@inproceedings{ + plappert2018parameter, + title={Parameter Space Noise for Exploration}, + author={Matthias Plappert and Rein Houthooft and Prafulla Dhariwal and Szymon Sidor and Richard Y. Chen and Xi Chen and Tamim Asfour and Pieter Abbeel and Marcin Andrychowicz}, + booktitle={International Conference on Learning Representations}, + year={2018}, + url={https://openreview.net/forum?id=ByBAl2eAZ}, +} +``` + +## APIs + +### PyTorch + +```{eval-rst} +.. automodule:: xuance.torch.agents.qlearning_family.noisydqn_agent + :members: + :undoc-members: + :show-inheritance: +``` + +### TensorFlow2 + +```{eval-rst} +.. automodule:: xuance.tensorflow.agents.qlearning_family.noisydqn_agent + :members: + :undoc-members: + :show-inheritance: +``` + +### MindSpore + +```{eval-rst} +.. automodule:: xuance.mindspore.agents.qlearning_family.noisydqn_agent + :members: + :undoc-members: + :show-inheritance: +``` diff --git a/docs/source/documents/api/agents/drl/noisydqn_agent.rst b/docs/source/documents/api/agents/drl/noisydqn_agent.rst deleted file mode 100644 index 3625f198b..000000000 --- a/docs/source/documents/api/agents/drl/noisydqn_agent.rst +++ /dev/null @@ -1,26 +0,0 @@ -DQN with Noisy Layers (NoisyDQN) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -PyTorch -'''''''''''' - -.. automodule:: xuance.torch.agents.qlearning_family.noisydqn_agent - :members: - :undoc-members: - :show-inheritance: - -TensorFlow2 -'''''''''''' - -.. automodule:: xuance.tensorflow.agents.qlearning_family.noisydqn_agent - :members: - :undoc-members: - :show-inheritance: - -MindSpore -'''''''''''' - -.. automodule:: xuance.mindspore.agents.qlearning_family.noisydqn_agent - :members: - :undoc-members: - :show-inheritance: