Skip to content
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.

Averaging model parameters #124

Open
patrik-lambert opened this issue Feb 7, 2018 · 2 comments
Open

Averaging model parameters #124

patrik-lambert opened this issue Feb 7, 2018 · 2 comments

Comments

@patrik-lambert
Copy link

Hi. I have seen the possibility (for example in generate-lines.lua) to decode with several models, whose softmax and attention scores are averaged to generate predictions. This has the inconvenient that decoding time increases nearly linearly with the number of models.

Since I need to keep decoding time low, I would like to build a new model by averaging parameters of several models. Are there methods/pieces of code in fairseq that I could use do to this?

Thanks!

@michaelauli
Copy link
Contributor

You can load the checkpoints of the models you would like to average and save the mean of all the parameters as a new checkpoint.

@patrik-lambert
Copy link
Author

Thank you for the tips! I have written the attached script. It seems to do the job. Does it seem correct to you? Does it include all the parameters?
average_models.lua.txt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants