| Dataset | 🏆 Leaderboard TBD | 📖 NeurIPS 2024 Paper |
IaC-Eval is a comprehensive framework for quantitatively evaluating the capabilities of large language models in cloud IaC code generation. Infrastructure-as-Code (IaC) is an important component of cloud computing, that allows the definition of cloud infrastructure in high-level programs. Our framework targets Terraform specifically for now. We leave integration of other IaC tools as future work.
IaC-Eval also provides the first human-curated and challenging Infrastructure-as-Code (IaC) dataset containing 458 questions ranging from simple to difficult across various cloud services (targeting AWS for now), which can be found in our HuggingFace repository.
We are actively developing and patching the project. However, as of now, IaC-Eval is not production-ready.
- Install Terraform (also install AWS CLI and setup credentials)
- Install Opa (make sure to add opa to path).
- *Obtain the following LLM model inference API keys as appropriate, depending on which of our currently supported models you want to perform evaluation on:
- OpenAI API token: for GPT-3.5-Turbo and GPT-4
- Google API token: for Gemini-1.0-Pro
- Replicate API token: for CodeLlama and WizardCoder variants
* Our evaluation against MagiCoder was performed on a manually deployed AWS SageMaker instance inference endpoint. We provide more details on our setup script, see evaluation/README.md
, if that is of interest.
To access and utilize the evaluation pipeline, you need to switch to a specific branch of this repository and set up the environment. Follow these steps:
-
Ensure you have the
main
branch of the project checked out. -
Install the Conda environment by running:
conda env create -f environment.yml
-
Activate the newly created Conda environment named
iac-eval
:conda activate iac-eval
Note: before
conda activate
you might need to doconda init SHELL_NAME
on your preferred shell (e.g.conda init bash
). If you run into problems initializing the shell session, try referring to this GitHub issue for a fix. -
(Optional) Preconfigure the retriever database (if you would like to use the RAG strategy): refer to instructions in
retriever/README.md
. -
See instructions in
evaluation/README.md
for details on how to use the main pipeline:eval.py
, and other scripts.
Note: You can run ./setup.sh
to check if you have Terraform and OPA installed. It will also create and activate the necessary conda environment. The shell script assumes you are using bash
, change #!/bin/SHELL
to your preferred shell in the script.
We welcome all forms of contribution! IaC-Eval aims to quantitatively and comprehensively evaluate the IaC code generation capabilities of large language models. If you find bugs or have ideas, please share them via GitHub Issues. This includes contributions to IaC-Eval's dataset, whose format can be found in it's HuggingFace repository.