diff --git a/main/.buildinfo b/main/.buildinfo index 661a6d8b2..ffcf955f1 100644 --- a/main/.buildinfo +++ b/main/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 18b35625685a6421654b91ec64e1e718 +config: dc56256d05e0a8b2d12747a78c3752e9 tags: d77d1c0d9ca2f4c8421862c7c5a0d620 diff --git a/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip b/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip index 75363a775..865bf18a6 100644 Binary files a/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip and b/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip differ diff --git a/main/_downloads/50e7c09c20b787d0a5bd70c4aeb0a515/vector_envs_tutorial.ipynb b/main/_downloads/50e7c09c20b787d0a5bd70c4aeb0a515/vector_envs_tutorial.ipynb index 0b5ed6fcd..d7a0ca7ac 100644 --- a/main/_downloads/50e7c09c20b787d0a5bd70c4aeb0a515/vector_envs_tutorial.ipynb +++ b/main/_downloads/50e7c09c20b787d0a5bd70c4aeb0a515/vector_envs_tutorial.ipynb @@ -154,7 +154,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Training the A2C Agent\n\nFor our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving\nthe losses and entropies to plot them after the agent finished training.\n\nYou may notice that the don't reset the vectorized envs at the start of each episode like we would usually do.\nThis is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish\nan episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps\n(`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then\nuse the rest of the timesteps to begin a new one).\n\n\n" + "## Training the A2C Agent\n\nFor our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving\nthe losses and entropies to plot them after the agent finished training.\n\nYou may notice that we don't reset the vectorized envs at the start of each episode like we would usually do.\nThis is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish\nan episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps\n(`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then\nuse the rest of the timesteps to begin a new one).\n\n\n" ] }, { diff --git a/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip b/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip index 80271584d..0635cd27f 100644 Binary files a/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip and b/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip differ diff --git a/main/_downloads/e688a889564af5a98daa8accfbca806e/vector_envs_tutorial.py b/main/_downloads/e688a889564af5a98daa8accfbca806e/vector_envs_tutorial.py index 4b978221f..238cb07a0 100644 --- a/main/_downloads/e688a889564af5a98daa8accfbca806e/vector_envs_tutorial.py +++ b/main/_downloads/e688a889564af5a98daa8accfbca806e/vector_envs_tutorial.py @@ -417,7 +417,7 @@ def update_parameters( # For our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving # the losses and entropies to plot them after the agent finished training. # -# You may notice that the don't reset the vectorized envs at the start of each episode like we would usually do. +# You may notice that we don't reset the vectorized envs at the start of each episode like we would usually do. # This is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish # an episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps # (`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then diff --git a/main/tutorials/gymnasium_basics/vector_envs_tutorial/index.html b/main/tutorials/gymnasium_basics/vector_envs_tutorial/index.html index 49cb586f1..fc9589456 100644 --- a/main/tutorials/gymnasium_basics/vector_envs_tutorial/index.html +++ b/main/tutorials/gymnasium_basics/vector_envs_tutorial/index.html @@ -837,7 +837,7 @@

Setup

Training the A2C Agent

For our training loop, we are using the RecordEpisodeStatistics wrapper to record the episode lengths and returns and we are also saving the losses and entropies to plot them after the agent finished training.

-

You may notice that the don’t reset the vectorized envs at the start of each episode like we would usually do. +

You may notice that we don’t reset the vectorized envs at the start of each episode like we would usually do. This is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish an episode because of the random seeds). As a result, we are also not collecting data in episodes, but rather just play a certain number of steps (n_steps_per_update) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then