This is a work in progress. Any ideas welcomed !
pip install -r requirements.txt
git clone https://github.com/mpSchrader/gym-sokoban.git
cd gym-sokoban
pip install -e .
See main.ipynb for details.
This implementation tries to follow "Learning to search with MCTSnets" (Guez et al.) [1] as closely as possible. All mentioned networks are available in MCTSnet/models/. We focused on a simple random policy as a first step and the approximate computation for the policy gradient is not implemented. This should however provide good results according to the authors.
MCTSnet/models/MCTSnet.py implements all the search logic.
The code tends to be quite general and should easily be adapted for any environment (we tried to use an interface as close to gym as possible).
MCTSnet/trainer.py provides an easy way to train, test and play games. Two games were tested:
- the Sokoban, as per the original article
- the MouseGame, a simple game were a mouse need to eat the cheese and avoid poison
The algorithm has proven to converge on the mousegame, however even after 15hours of training the MCTSnet performance is poor. Additionnal testing is required.
For a detailed description please check https://github.com/faameunier/MCTSnet/blob/master/RL_Manuscript.pdf
[1] Arthur Guez, Theophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, and David Silver. Learning to search with mctsnets. CoRR, abs/1802.04697, 2018.