Skip to content

Commit

Permalink
Clarify results, move report
Browse files Browse the repository at this point in the history
  • Loading branch information
nielstron committed Nov 19, 2024
1 parent c10c009 commit ce27f0d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,17 +75,17 @@ Pass it the path to your evaluation, including run_id and model to get a simple
For example, to reproduce the results for SWE-Agent from Table 2 and 3 of the paper, run the following command:

```bash
python3 report.py run_instance_swt_logs/swea__gpt-4-1106-preview/gpt4__SWE-bench_Lite__default_test_demo3__t-0.00__p-0.95__c-3.00__install-1
# |---------------------|------------------------------------------------|
# | Method | run_instance_swt_logs/swea__gpt-4-1106-preview |
# | Applicability (W) | 87.31884057971014 |
# | Success Rate (S) | 15.942028985507246 |
# | F->X | 48.18840579710145 |
# | F->P | 16.666666666666668 |
# | P->P | 9.782608695652174 |
# | Coverage | 26.488815129800212 |
# | Resolved Coverage | 64.69774543638181 |
# | Unresolved Coverage | 19.14736127176707 |
python -m src.report run_instance_swt_logs/swea__gpt-4-1106-preview/gpt4__SWE-bench_Lite__default_test_demo3__t-0.00__p-0.95__c-3.00__install-1
# |------------------------------------|--------------------------|
# | Method | swea__gpt-4-1106-preview |
# | Applicability (W) | 87.31884057971014 |
# | Success Rate (S) | 15.942028985507246 |
# | F->X | 48.18840579710145 |
# | F->P | 16.666666666666668 |
# | P->P | 9.782608695652174 |
# | Coverage Delta (Δᵃˡˡ) | 26.488815129800212 |
# | Coverage Delta Resolved (Δᔆ) | 64.69774543638181 |
# | Coverage Delta Unresolved (Δⁿᵒᵗ ᔆ) | 19.14736127176707 |
```

In order to see a coverage delta reported, you need to have the gold evaluation included in the same evaluation path, i.e. download the golden results into `run_instance_swt_logs` from the downloads section below.
Expand Down
2 changes: 1 addition & 1 deletion report.py → src/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def main(
fields = (
[r"{$\dc^{\text{all}}$ }", r"{$\dc^{\suc}$}", r"{$\dc^{\neg\suc}$}"]
if format.startswith("latex") else
["Coverage Delta", "Coverage Delta Resolved", "Coverage Delta Unresolved"]
["Coverage Delta (Δᵃˡˡ)", "Coverage Delta Resolved (Δᔆ)", "Coverage Delta Unresolved (Δⁿᵒᵗ ᔆ)"]
)
total_coverage_possible = count_coverage_delta_gold(gold_reports)
resolved_reports, unresolved_reports = filtered_by_resolved(reports)
Expand Down

0 comments on commit ce27f0d

Please sign in to comment.