Question about checkpointing #41

zhaoaustin · 2024-12-07T03:08:01Z

Hi, I am a big fan of MeZO. In your paper, you mentioned that gradient checkpointing was not used. However, the following code in trainer.py seems to enable it. I am a bit confused about whether the 4GB GPU consumption for MeZO fine-tuning on SST2 includes the use of gradient checkpointing. Thank you for your clarification!

    # Activate gradient checkpointing if needed
    if args.gradient_checkpointing:
        self.model.gradient_checkpointing_enable()

The text was updated successfully, but these errors were encountered:

gaotianyu1350 · 2024-12-09T12:06:40Z

Hi,

This part of code is from Hugging Face's tranformers. I believe in our script we didn't actually add "--gradient_checkpointing" and hence gradient checkpointing is not actually in use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about checkpointing #41

Question about checkpointing #41

zhaoaustin commented Dec 7, 2024

gaotianyu1350 commented Dec 9, 2024

Question about checkpointing #41

Question about checkpointing #41

Comments

zhaoaustin commented Dec 7, 2024

gaotianyu1350 commented Dec 9, 2024