Burro is a command-line interface (CLI) tool for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.
- Three specialized evaluation types:
- Answer correctness evaluation with context
- Close-ended QA matching
- Simple output-expected comparison
- Secure OpenAI API key management
- JSON-based evaluation configurations
- OpenAI API key
sudo curl -L "https://github.com/thisguymartin/burro/releases/download/latest/build-mac-silicon" -o /usr/local/bin/burro && sudo chmod +x /usr/local/bin/burro
sudo curl -L "https://github.com/thisguymartin/burro/releases/download/latest/build-mac-intel" -o /usr/local/bin/burro && sudo chmod +x /usr/local/bin/burro
sudo curl -L "https://github.com/thisguymartin/burro/releases/download/latest/build-linux-arm" -o /usr/local/bin/burro && sudo chmod +x /usr/local/bin/burro
sudo curl -L "https://github.com/thisguymartin/burro/releases/download/latest/build-linux-intel" -o /usr/local/bin/burro && sudo chmod +x /usr/local/bin/burro
- Download
build-windows.exe
from the releases page - Rename it to
burro.exe
- Move it to your desired location (e.g.,
C:\Program Files\burro\burro.exe
)
burro set-openai-key
burro run-eval <evaluation-file>
-
Close QA (closeqa.json)
- Exact matching for close-ended questions
- Strict format validation
- Support for multiple correct answers
-
Simple Evals (evals.json)
- Basic output vs expected comparisons
- Quick and efficient validation
- Flexible matching options
Advanced evaluation methods using LLMs as judges:
- 🔜 Battle: Compare outputs from different models head-to-head
- 🔜 Humor: Evaluate the humor and wit in model responses
- 🔜 Moderation: Check content for safety and appropriateness
- 🔜 Security: Assess responses for potential security vulnerabilities
- 🔜 Summarization: Evaluate the quality and accuracy of text summaries
- 🔜 SQL: Verify the correctness of generated SQL queries
- 🔜 Translation: Assess translation quality across languages
- 🔜 Fine-tuned binary classifiers: Specialized evaluations using custom-trained models
Mathematical and algorithmic comparison methods:
- 🔜 Levenshtein distance: Measure string similarity using edit distance
- 🔜 Exact match: Check for perfect matches between outputs
- 🔜 Numeric difference: Compare numerical values and tolerances
- 🔜 JSON diff: Analyze structural differences in JSON outputs
- 🔜 Jaccard distance: Calculate similarity between sets of tokens
Evaluates exact matching responses for close-ended questions.
Example format:
{
"input": "List the first three prime numbers in ascending order, separated by commas.",
"output": "2,3,5",
"criteria": "Numbers must be in correct order, separated by commas with no spaces"
}
Compares model outputs against expected answers.
Example format:
{
"input": "What is the capital of France?",
"output": "The capital city of France is Paris",
"expected": "Paris"
}
- AES encryption for API key storage
- Secure key generation
- Encrypted SQLite storage
To determine which version you should download, you can check your system's architecture:
uname -m
This will return:
arm64
: Use Apple Silicon version (M1/M2/M3 Macs)x86_64
: Use Intel version
uname -m
This will return:
aarch64
orarm64
: Use Linux ARM versionx86_64
: Use Linux Intel version
If you encounter permission issues during installation:
# Check current permissions
ls -l /usr/local/bin/burro
# Fix permissions if needed
sudo chmod +x /usr/local/bin/burro
If burro
command is not found after installation:
- Verify the installation location is in your PATH
- Try restarting your terminal
- Verify the executable exists and has proper permissions
sudo rm /usr/local/bin/burro
# Verify removal
which burro # Should return nothing if successfully removed
- Delete
burro.exe
from your installation location - If added to PATH:
- Open System Properties (Win + Pause|Break)
- Click "Advanced system settings"
- Click "Environment Variables"
- Under "System variables" or "User variables", find "Path"
- Click "Edit"
- Remove the directory containing burro.exe
- Click "OK" to save changes
Verify removal:
where.exe burro # Should return nothing if successfully removed