This repository implements an Actor-Critic Model Predictive Control (AC-MPC) framework that combines the strengths of Model Predictive Control (MPC) with Reinforcement Learning (RL). The implementation includes various classic control environments and a novel dynamical system environment.
The project implements a hybrid control approach that integrates:
- Model Predictive Control (MPC) for optimal trajectory planning
- Reinforcement Learning (specifically PPO - Proximal Policy Optimization) for policy learning
- Actor-Critic architecture for improved policy optimization
The following environments are supported:
- Pendulum
- Cart Pole (both discrete and continuous reward versions)
- Mountain Car
- Acrobot
- Custom Dynamical System (with configurable parameters like wind gusts, friction, etc.)
mpc.py
: Implementation of Model Predictive Control algorithmspolicy.py
: Policy networks and Actor-Critic implementationssystem.py
: System dynamics and modelsenv.py
: Environment implementations and wrapperscosts.py
: Cost functions for different environmentslinearizer.py
: System linearization utilitiesutils.py
: Utility functions and helpers
- Training scripts for each environment (e.g.,
train_acmpc_multienv_pendulum_args.py
) - PPO baseline training scripts (e.g.,
train_ppo_pendulum.py
) - Support for multi-environment training
- Test scripts for each environment (e.g.,
test_mpc_pendulum.py
) - Systematic evaluation scripts (e.g.,
test_acmpc_systematic_dynamical_system_args.py
)
- Hybrid control combining MPC and RL
- Support for multiple classic control environments
- Configurable system parameters (friction, wind gusts, etc.)
- Gaussian noise wrappers for robustness
- Integration with Weights & Biases for experiment tracking
- TensorBoard support for visualization
- Systematic evaluation tools
-
Create a conda environment using the provided environment files:
# For MacOS/Linux conda env create -f environment.yml # For Linux-specific dependencies conda env create -f environment_linux.yml
-
Activate the environment:
conda activate acmpc
Train an AC-MPC agent on different environments:
# Train on Pendulum
python train_acmpc_multienv_pendulum_args.py
# Train on Cart Pole
python train_acmpc_multienv_cart_pole_args.py
# Train on Mountain Car
python train_acmpc_multienv_mountain_car_args.py
Test trained models:
# Test on Pendulum
python test_acmpc_pendulum_args.py --model_name path/to/model
# Test on Dynamical System
python test_acmpc_dynamical_system_args.py --model_name path/to/model
n_envs
: Number of parallel environmentsprediction_horizon
: MPC prediction horizonnum_optimization_step
: Number of optimization steps in MPCgaussian_noise_scale
: Scale of Gaussian noise for robustness- Environment-specific parameters (e.g., wind gusts, friction coefficients)
The project uses Weights & Biases for experiment tracking. Key metrics logged include:
- Training rewards
- Episode lengths
- Policy gradients
- Value function losses
- Model checkpoints
Key dependencies include:
- PyTorch
- Stable Baselines3
- Gymnasium
- TensorBoard
- Weights & Biases
For a complete list of dependencies, refer to environment.yml
or environment_linux.yml
.