You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should do a proof of concept for knowledge distillation from an LLM to our standard student model. The main benefit of this if it works is we won't need to deal with parallel data cleaning and training teacher models. All this can be quite challenging, especially for lower-resource languages.
This would require:
Estimate costs for different LLMs and APIs we can use
Run quality evaluation for those to see which model would provide the best cost/quality tradeoff
Choose a mix of monolingual data to translate
Run translation with an LLM
Train a regular student model on this data
Try different LLMs and corpus of different sizes (for example, 10M and 50M sentences)
Folks from WMT also suggested we can try pre-training the student on parallel OPUS corpus as is and then finetune on a smaller but high-quality LLM-produced corpus to make it more cost efficient.
The text was updated successfully, but these errors were encountered:
We should do a proof of concept for knowledge distillation from an LLM to our standard student model. The main benefit of this if it works is we won't need to deal with parallel data cleaning and training teacher models. All this can be quite challenging, especially for lower-resource languages.
This would require:
Folks from WMT also suggested we can try pre-training the student on parallel OPUS corpus as is and then finetune on a smaller but high-quality LLM-produced corpus to make it more cost efficient.
The text was updated successfully, but these errors were encountered: