Skip to content

Latest commit

 

History

History
126 lines (96 loc) · 5.09 KB

experiments.md

File metadata and controls

126 lines (96 loc) · 5.09 KB

Launching Experiments

Experiment details and hyperparameters are organized in uniquely named Scenarios. When launching a learning script, you will generally specify a scenario name as a command-line argument. Experiment scenarios are defined in rl_apps/apps/scenarios/catalog.

NFSP

Launch a single script that trains a best response and average policy for each player.

# from the repository root
python rlapps/apps/nfsp/general_nfsp.py --scenario <my_scenario_name>

Available NFSP scenario names include:

kuhn_nfsp_dqn
leduc_nfsp_dqn
20_clone_leduc_nfsp_dqn
40_clone_leduc_nfsp_dqn
80_clone_leduc_nfsp_dqn
loss_game_nfsp_10_moves_alpha_2.7

PSRO

PSRO consists of three scripts that are launched on separate terminals:

  • The manager script (to track the population / payoff table and launch empirical payoff evaluations)
  • Two RL best response learner scripts for each of the 2 players

The manager acts as a server that the best response learners connect to via gRPC.

(tmux with a nice configuration is useful for managing and organizing many terminal sessions)

# from the repository root
python rlapps/apps/psro/general_psro_manager.py --scenario <my_scenario_name>
# in a 2nd terminal
python rlapps/apps/psro/general_psro_br.py --player 0 --scenario <same_scenario_as_manager>
# in a 3rd terminal
python rlapps/apps/psro/general_psro_br.py --player 1 --scenario <same_scenario_as_manager>

If launching each of these scripts on the same computer, the best response scripts will automatically connect to a manager running the same scenario/seed on a randomized port defined by the manager in \tmp\rlapps_ports.json. Otherwise, pass the --help argument to these scripts to see options for specifying hosts and ports.

Multiple experiments with the same scenario can be launched on a single host by setting the GRL_SEED environment variable to a different integer value for each set of corresponding processes. If unset, GRL_SEED defaults to 0.

Available PSRO scenario names include:

kuhn_psro_dqn
leduc_psro_dqn
20_clone_leduc_psro_dqn
40_clone_leduc_psro_dqn
80_clone_leduc_psro_dqn
loss_game_psro_10_moves_alpha_2.7
loss_game_psro_10_moves_multi_dim_max_move_0.1_16_dim

NXDO

Like PSRO, NXDO consists of three scripts that are launched on separate terminals:

  • The manager script (to track the population and train the extensive form metanash)
  • Two RL best response learner scripts for each of the 2 players

The manager acts as a server that the best response learners connect to via gRPC.

# from the repository root
cd grl/rl_apps/nxdo
python rlapps/apps/nxdo/general_nxdo_manager.py --scenario <my_scenario_name>
# in a 2nd terminal
python rlapps/apps/nxdo/general_nxdo_br.py --player 0 --scenario <same_scenario_as_manager>
# in a 3rd terminal
python rlapps/apps/nxdo/general_nxdo_br.py --player 1 --scenario <same_scenario_as_manager>

If launching each of these scripts on the same computer, the best response scripts will automatically connect to a manager running the same scenario/seed on a randomized port defined by the manager in \tmp\rlapps_ports.json. Otherwise, pass the --help argument to these scripts to see options for specifying hosts and ports.

Multiple experiments with the same scenario can be launched on a single host by setting the GRL_SEED environment variable to a different integer value for each set of corresponding processes. If unset, GRL_SEED defaults to 0.

Available NXDO scenario names include:

kuhn_nxdo_dqn_nfsp
leduc_nxdo_dqn_nfsp
20_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
40_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
80_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
va_20_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
va_40_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
va_80_clone_leduc_nxdo_dqn_nfsp_dynamic_threshold_1_aggressive
loss_game_nxdo_10_moves_alpha_2.7
loss_game_nxdo_10_moves_multi_dim_max_move_0.1_16_dim

Graphing Results

When running each algorithm, the log file path containing timesteps, episodes, and exploitability data, when printed, will be highlighted in green with a note, (Graph this in a notebook).

Ray RLlib logs with the learning stats displayed throughout the learning process will also be produced in csv, json, and tensorboard in the same directories in the <repo_root>/rlapps/data directory.

Check graph_poker_results.ipynb for an example notebook graphing m-clone poker results for NXDO, PSRO, and NFSP.

Running PSRO/NXDO Experiments in a Single Shell

Example scripts for running PSRO and NXDO experiments in a single shell can be found in the examples directory.

PSRO example:

# in <repo root>/examples
python launch_psro_as_single_script.py --scenario kuhn_psro_dqn

NXDO example:

# in <repo root>/examples
python launch_nxdo_as_single_script.py --scenario 1_step_kuhn_nxdo_dqn_nfsp