A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environments.
reinforcement-learning nethack mcts flax model-based-rl jax model-based-reinforcement-learning muzero minihack rlax
-
Updated
Sep 19, 2022 - Python