Skip to content

Commit

Permalink
Deploying to gh-pages from @ 4a85589 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
pseudo-rnd-thoughts committed Aug 5, 2024
1 parent 167d036 commit 1ba2089
Show file tree
Hide file tree
Showing 6 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion main/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 18b35625685a6421654b91ec64e1e718
config: dc56256d05e0a8b2d12747a78c3752e9
tags: d77d1c0d9ca2f4c8421862c7c5a0d620
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the A2C Agent\n\nFor our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving\nthe losses and entropies to plot them after the agent finished training.\n\nYou may notice that the don't reset the vectorized envs at the start of each episode like we would usually do.\nThis is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish\nan episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps\n(`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then\nuse the rest of the timesteps to begin a new one).\n\n\n"
"## Training the A2C Agent\n\nFor our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving\nthe losses and entropies to plot them after the agent finished training.\n\nYou may notice that we don't reset the vectorized envs at the start of each episode like we would usually do.\nThis is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish\nan episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps\n(`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then\nuse the rest of the timesteps to begin a new one).\n\n\n"
]
},
{
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@ def update_parameters(
# For our training loop, we are using the `RecordEpisodeStatistics` wrapper to record the episode lengths and returns and we are also saving
# the losses and entropies to plot them after the agent finished training.
#
# You may notice that the don't reset the vectorized envs at the start of each episode like we would usually do.
# You may notice that we don't reset the vectorized envs at the start of each episode like we would usually do.
# This is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish
# an episode because of the random seeds). As a result, we are also not collecting data in `episodes`, but rather just play a certain number of steps
# (`n_steps_per_update`) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,7 @@ <h2>Setup<a class="headerlink" href="#setup" title="Link to this heading">¶</a>
<h2>Training the A2C Agent<a class="headerlink" href="#training-the-a2c-agent" title="Link to this heading"></a></h2>
<p>For our training loop, we are using the <cite>RecordEpisodeStatistics</cite> wrapper to record the episode lengths and returns and we are also saving
the losses and entropies to plot them after the agent finished training.</p>
<p>You may notice that the don’t reset the vectorized envs at the start of each episode like we would usually do.
<p>You may notice that we don’t reset the vectorized envs at the start of each episode like we would usually do.
This is because each environment resets automatically once the episode finishes (each environment takes a different number of timesteps to finish
an episode because of the random seeds). As a result, we are also not collecting data in <cite>episodes</cite>, but rather just play a certain number of steps
(<cite>n_steps_per_update</cite>) in each environment (as an example, this could mean that we play 20 timesteps to finish an episode and then
Expand Down

0 comments on commit 1ba2089

Please sign in to comment.