Play game super mario using Proximal Policy Optimization method.
Tested in Windows 8.1, Windows 10, Ubuntu16.04.
Python=3.6, Pytorch>=0.4.0.
Other requirements package.
pip install -r requirements.txt
Save video need install ffmpeg.
Usage
# Train a agent from scratch
python run.py train
Download pre-trained model from here.
# Play game with a trained model
python run.py play ./pre_trained_model/mario_10000-best.dat
Training processing takes about 5 hours when I use nvidia-V100(1GPU, 16 parallel game envs), rewards will reach about 200.0 and game length 275 steps. It look like below when model converge.
- Proximal Policy Optimization
- http://blog.varunajayasiri.com/ml/ppo.html This great post help me a lot. It tell me how to warp a game like deepmind done with atari game.
- Game environment
- PPO tutorial code That is a very clean project and friendly for newbie rl algorithm learner. I borrow part code from it.