You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on WMT24, sentence-level translation is going away. There's now more document-level training data available (for example HPLT), and WMT24 used document-level datasets for evaluation.
In a shift towards document-level evaluation, we
no longer provide source texts segmented into indi-
vidual sentences. Instead, we keep all paragraphs
intact and evaluated together.
This would require:
adapting document level datasets to leave some paragraphs to train on instead of splitting to sentences
fix cleaning procedures
find evaluation datasets
implement inference support
The text was updated successfully, but these errors were encountered:
eu9ene
added
meta
A collection of sub-issues that uses a tasklist
quality
Improving robustness and translation quality
labels
Jan 15, 2025
Based on WMT24, sentence-level translation is going away. There's now more document-level training data available (for example HPLT), and WMT24 used document-level datasets for evaluation.
See Findings of WMT 2024 Shared task
This would require:
The text was updated successfully, but these errors were encountered: