Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full finetuning with Roberta-Large #40

Open
aparna-aketi opened this issue Oct 18, 2024 · 5 comments
Open

Full finetuning with Roberta-Large #40

aparna-aketi opened this issue Oct 18, 2024 · 5 comments

Comments

@aparna-aketi
Copy link

aparna-aketi commented Oct 18, 2024

I want to run a full fine-tuning with RoBERTa-large. The readme file suggest to use following command

# Adam fine-tuning
TASK=SST-2 K=16 SEED=42 BS=8 LR=1e-5 MODEL=roberta-large bash finetune.sh

However, the type parameter is set to TYPE:-"prompt". Shouldn't this be set to "finetune"?

@gaotianyu1350
Copy link
Member

Hi,

Here "prompt" just means to prompt-based fine-tuning (https://arxiv.org/abs/2012.15723), a very standard way to fine-tuning language models nowadays.

@aparna-aketi
Copy link
Author

aparna-aketi commented Oct 22, 2024

Hi, Thanks for the response. Just for clarification, in figure 2 of the MeZO paper, does FT correspond to full fine-tuning or prompt-based fine-tuning? I want to reproduce the results corresponding to that figure.

@gaotianyu1350
Copy link
Member

Hi, everything we report is prompt-based fine-tuning, since that provides much better performance

@aparna-aketi
Copy link
Author

aparna-aketi commented Oct 23, 2024

Okay, thanks for clarification. One more question: mezo.sh file has the steps to be 100k and run_fewshot.sh has 1000 steps. In the figure 2, is MeZO run for 100x more steps than FT. Is that correct? It doesn't seem like a fair comparison as MeZO uses 100x more number of steps than FT. Even if we consider the backward pass to be 2x more expensive than forward pass, we should be using 3x steps for MeZO to do a fair comparison with FT. It would be great if you could provide some insights here.

@gaotianyu1350
Copy link
Member

Hi,

Yes MeZO is run with 100x more steps than FT. It is not a fair comparison in terms of wall clock time. The Roberta-large experiments are mainly to showcase that it is possible to train models without backpropagation (which saves a lot of memory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants