You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am a big fan of MeZO. In your paper, you mentioned that gradient checkpointing was not used. However, the following code in trainer.py seems to enable it. I am a bit confused about whether the 4GB GPU consumption for MeZO fine-tuning on SST2 includes the use of gradient checkpointing. Thank you for your clarification!
# Activate gradient checkpointing if needed
if args.gradient_checkpointing:
self.model.gradient_checkpointing_enable()
The text was updated successfully, but these errors were encountered:
This part of code is from Hugging Face's tranformers. I believe in our script we didn't actually add "--gradient_checkpointing" and hence gradient checkpointing is not actually in use.
Hi, I am a big fan of MeZO. In your paper, you mentioned that gradient checkpointing was not used. However, the following code in trainer.py seems to enable it. I am a bit confused about whether the 4GB GPU consumption for MeZO fine-tuning on SST2 includes the use of gradient checkpointing. Thank you for your clarification!
The text was updated successfully, but these errors were encountered: