-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full finetuning with Roberta-Large #40
Comments
Hi, Here "prompt" just means to prompt-based fine-tuning (https://arxiv.org/abs/2012.15723), a very standard way to fine-tuning language models nowadays. |
Hi, Thanks for the response. Just for clarification, in figure 2 of the MeZO paper, does FT correspond to full fine-tuning or prompt-based fine-tuning? I want to reproduce the results corresponding to that figure. |
Hi, everything we report is prompt-based fine-tuning, since that provides much better performance |
Okay, thanks for clarification. One more question: mezo.sh file has the steps to be 100k and run_fewshot.sh has 1000 steps. In the figure 2, is MeZO run for 100x more steps than FT. Is that correct? It doesn't seem like a fair comparison as MeZO uses 100x more number of steps than FT. Even if we consider the backward pass to be 2x more expensive than forward pass, we should be using 3x steps for MeZO to do a fair comparison with FT. It would be great if you could provide some insights here. |
Hi, Yes MeZO is run with 100x more steps than FT. It is not a fair comparison in terms of wall clock time. The Roberta-large experiments are mainly to showcase that it is possible to train models without backpropagation (which saves a lot of memory). |
I want to run a full fine-tuning with RoBERTa-large. The readme file suggest to use following command
However, the type parameter is set to
TYPE:-"prompt"
. Shouldn't this be set to"finetune"
?The text was updated successfully, but these errors were encountered: