Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

book errata p 682 #129

Open
elfelround opened this issue May 19, 2020 · 3 comments
Open

book errata p 682 #129

elfelround opened this issue May 19, 2020 · 3 comments

Comments

@elfelround
Copy link
Contributor

758A2709-3501-4DF5-BC49-7C11EA40C4BB
book errata p 682

@vmirly
Copy link
Collaborator

vmirly commented May 22, 2020

Thanks for pointing out this errata. You are right, I think I missed counting the final step. At t=7 an action is taken in order to go to the terminal state T.

So for episode 1, t goes from 0 to 8, and recall that the sequence is denoted by

  • t=0: <S0, A0, R1>
  • t=1: <S1, A1, R2>
  • t=2: <S2, A2, R3>
  • ..
  • t=7: <S7, A7, R8>
  • Terminal state T

Also , we know the following immediate rewards: R1=0, R2=0, ..., R6=0, R7=0, R8=1

So now let's calculate the returns for the first episode:

  • At t=0 : S_0=B => G_0 = R1 + gamma * R2 + ... + gamma^6 * R7 + gamma^7 * R8 = gamma^7 * R8
  • At t=1 : S_1=B => G_1 = R2 + gamma * R3 + ... + gamma^5 * R7 + gamma^6 * R8 = gamma^6 * R8
  • At t=2 : S_2 =C => G_2 = R3 + gamma * R4 + ... + gamma^4 * R7 + gamma^5 * R8 = gamma^5 * R8
  • At t=3 : S_3 =C => G_3 = R4 + gamma * R5 + ... + gamma^3 * R7 + gamma^4 * R8 = gamma^4 * R8
  • At t=4 : S_4 =C => G_4 = R5 + gamma * R6 + gamma^2 * R7 + gamma^3 * R8 = gamma^3 * R8
  • At t=5 : S_5 =C => G_5 = R6 + gamma * R7 + gamma^2*R8= gamma^2 * R7
  • At t=6 : S_6 = B => G_6 = R7 + gamma * R8 = gamma
  • At t=7 : S_7 = A => G_7 = R8 = 1
  • Terminal state T

Similarly, for episode 2, t goes from 0 to 10, and R0=R1=...=R9=0 while R10=-1

  • At t=0 : S_0=A => G_0 = R1 + gamma * R2 + ... + gamma^9 * R10 = gamma^9 * R10
  • At t=1 : S_1=B => G_1 = R2 + gamma * R3 + ... + gamma^8 * R10 = gamma^8 * R10
  • At t=2 : S_2 =B => G_2 = R3 + gamma * R4 + ... + gamma^7 * R10 = gamma^7 * R10
  • At t=3 : S_3 =B => G_3 = R4 + gamma * R5 + ... + gamma^6 * R10 = gamma^6 * R10
  • At t=4 : S_4 =B => G_4 = R5 + gamma * R6 + ... + gamma^5 * R10 = gamma^5 * R10
  • At t=5 : S_5 =B => G_5 = R6 + gamma * R7 + ... + gamma^4 * R10 = gamma^4 * R10
  • At t=6 : S_6 =B => G_6 = R7 + gamma * R8 + ... + gamma^3*R10 = gamma^3 * R10
  • At t=7 : S_7 =B => G_7 = R8 + gamma * R9 + gamma^2 * R10 = gamma^2 * R10
  • At t=8 : S_8 =B => G_8 = R9 + gamma * R10 = gamma * R10
  • At t=9 : S_9 = B => G_9 = R10 = -1
  • Terminal state T

@elfelround
Copy link
Contributor Author

elfelround commented May 24, 2020

@vmirly posted this errata via packt but it was annoying as heck to explain without an image, they just said, can u further explain? and i thought fuck it. having a corrected answer will facilitate my understanding of this part, ill read it when i have time with my current book and let you know how this follows :) also the RL chapter is still great, but feels rushed in comparison with whole book, maybe a bit more love to it on 4th ed? xx

@rasbt
Copy link
Owner

rasbt commented Jul 28, 2020

also the RL chapter is still great, but feels rushed in comparison with whole book, maybe a bit more love to it on 4th ed? xx

Oh yeah for sure. The rewrites for Tf 2.0 took much longer than expected. And we both were very busy in Fall (due to my teaching responsibilities and Vahid starting a new position). It definitely could and should be smoothened out in a potential next edition. Thanks for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants