Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about checkpointing #41

Open
zhaoaustin opened this issue Dec 7, 2024 · 1 comment
Open

Question about checkpointing #41

zhaoaustin opened this issue Dec 7, 2024 · 1 comment

Comments

@zhaoaustin
Copy link

Hi, I am a big fan of MeZO. In your paper, you mentioned that gradient checkpointing was not used. However, the following code in trainer.py seems to enable it. I am a bit confused about whether the 4GB GPU consumption for MeZO fine-tuning on SST2 includes the use of gradient checkpointing. Thank you for your clarification!

    # Activate gradient checkpointing if needed
    if args.gradient_checkpointing:
        self.model.gradient_checkpointing_enable()
@gaotianyu1350
Copy link
Member

Hi,

This part of code is from Hugging Face's tranformers. I believe in our script we didn't actually add "--gradient_checkpointing" and hence gradient checkpointing is not actually in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants