You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello there, Marco!
I liked your idea of using Prioritized Experience Replay to improve the learning process, the concept is quite new to me so I found your work very interesting. There are some things I'd like to point out:
Inside the Memory class you define the capacity attribute, but, for what I can see, it's never used;
The way you defined the number of states overestimates the number of possible board states by a big margin because it considers a significant number of impossible states (for example those where there are more Os than Xs), from what I see this isn't a big issue because your algorithm never visits these states but it's quite expensive in terms of memory usage;
Some comments refer to unused pieces of code and it hurts the overall readability, so I suggest getting rid of them.
I hope this review was useful to you and good luck on the final project. 😊
The text was updated successfully, but these errors were encountered:
Your consideration was about the number of possible states of the tic-tac-toe board, which is a square grid of 3x3 cells. Each cell can be empty, occupied by a cross or by a circle, so there are 3 possibilities for each cell. The total number of possible states is therefore given by 3 raised to the power of 9, which is equal to 19,683.
However, many of these states are equivalent to each other, if you consider the symmetries of the board, such as rotations and reflections. In other words, two states are equivalent if you can obtain one from the other by rotating or flipping the board.
If you want to count only the distinct states, without considering the symmetries, the number
is reduced to 7653.
If instead you want to count only the legal states, that is, those that can be reached by following the rules of the game, the number is reduced further to 255,1684. This is because not all states are possible in a real game, for example those in which there are more crosses than circles, or those in which there are more than one row of three equal symbols.
I didn't think of these forms of optimization at first, so I simply used a matrix considering all the 19683 possible states but yes, you're basically right, thank you for having reported it.
Hello there, Marco!
I liked your idea of using Prioritized Experience Replay to improve the learning process, the concept is quite new to me so I found your work very interesting. There are some things I'd like to point out:
Memory
class you define thecapacity
attribute, but, for what I can see, it's never used;I hope this review was useful to you and good luck on the final project. 😊
The text was updated successfully, but these errors were encountered: