We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, folks, thanks a lot for releasing this amazing project!!!
I was trying to run the evaluation module, with the following code,
import sys from pathlib import Path sys.path.append(str(Path(".").absolute().parent)) from codetf.models import load_model_pipeline from codetf.data_utility.util import EOF_STRINGS, EndOfFunctionCriteria, remove_last_block from torch.utils.data.dataloader import DataLoader from transformers import StoppingCriteriaList from codetf.data_utility.human_eval_dataset import HumanEvalDataset import torch import os from accelerate import Accelerator import torch from collections import defaultdict from tqdm import tqdm import torch from evaluate import load from codetf.performance.model_evaluator import ModelEvaluator from torch.utils.data import TensorDataset def main(): os.environ["HF_ALLOW_CODE_EVAL"] = "1" os.environ["TOKENIZERS_PARALLELISM"] = "true" model_class = load_model_pipeline(model_name="codet5", task="pretrained", model_type="plus-220M", is_eval=True, load_in_8bit=True, weight_sharding=False) dataset = HumanEvalDataset(tokenizer=model_class.get_tokenizer()) prompt_token_ids, prompt_attention_masks, references= dataset.load() problems = TensorDataset(prompt_token_ids, prompt_attention_masks) evaluator = ModelEvaluator(model_class) avg_pass_at_k = evaluator.evaluate_pass_k(problems=problems, unit_tests=references, \ num_workers=1, k=[1,10,100], batch_size=256, \ num_return_sequences=200, sequences_per_chunk=10) print("Pass@k: ", avg_pass_at_k) if __name__ == "__main__": main()
where I changed the evaluation parameters, and also the model to T5p-220M.
Surprisingly, the pass@k was too good to be true, I got
Pass@k: {'pass@1': 0.09500000000000008, 'pass@10': 0.6403557771996593, 'pass@100': 0.9999992577464678}
which is much higher than 220M results in the T5+ paper.
I wonder is there anything wrong in the evaluation code above (it seems fine to me but I am sorry if I made some silly mistake)?
BTW, I wonder if you guys could kindly give some sample code of evaluating MBPP/APPS by any chance? Thanks a lot!! :)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi, folks, thanks a lot for releasing this amazing project!!!
I was trying to run the evaluation module, with the following code,
where I changed the evaluation parameters, and also the model to T5p-220M.
Surprisingly, the pass@k was too good to be true, I got
which is much higher than 220M results in the T5+ paper.
I wonder is there anything wrong in the evaluation code above (it seems fine to me but I am sorry if I made some silly mistake)?
BTW, I wonder if you guys could kindly give some sample code of evaluating MBPP/APPS by any chance? Thanks a lot!! :)
The text was updated successfully, but these errors were encountered: