Skip to content

Latest commit

 

History

History
35 lines (26 loc) · 1.59 KB

README.md

File metadata and controls

35 lines (26 loc) · 1.59 KB

This repository contains code accompanying the paper A Systematic Evaluation of Decoding-Free Generative Candidate Selection Methods.

MCQ

For MCQ datasets, to execute the full decoding method, run mcq_decoding.py and specify the desired LM and dataset arguments. For example,

python mcq_decoding.py --model_name meta-llama/Meta-Llama-3-8B --dataset commonsense_qa

To execute estimation methods, run mcq_estimation.py and specify the LM, dataset. For example,

python mcq_estimation.py --model_name meta-llama/Meta-Llama-3-8B --dataset commonsense_qa

The scripts download and preprocess data, perform inference, and compute the corresponding metrics, which are stored in results/date/. In particular, mcq_estimation.py computes the logit once for all estimation methods.

The variable names for these arguments are \verb|model_name|, \verb|dataset|, with their corresponding range of values as follows.

model_name: {meta-llama/Meta-Llama-3-8B, 
             meta-llama/Meta-Llama-3-8B-Instruct,
             mistralai/Mistral-7B-v0.3, 
             mistralai/Mistral-7B-Instruct-v0.3,   
             google/flan-t5-xl}
dataset: {commonsense_qa, mmlu, gpqa, big_bench, arc} 

Clibench

Download the test data of the four clinical decision tasks from CliBench

To execute estimation methods, run clibench_estimation.py and specify the LM, dataset. For example,

python clibench_estimation.py --model_name meta-llama/Meta-Llama-3-8B --target-task target_diagnoses