Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand LLM Evaluation Use Case - Translation #628

Open
jularase opened this issue Jan 15, 2025 · 1 comment
Open

Expand LLM Evaluation Use Case - Translation #628

jularase opened this issue Jan 15, 2025 · 1 comment
Labels

Comments

@jularase
Copy link
Collaborator

jularase commented Jan 15, 2025

Objective: Expand Lumigator’s capabilities to include workflows for evaluating multilingual models, particularly focusing on translation quality.

Why This Matters:
Translation tasks are critical for industries like localization, global commerce, and media, where fluency and accuracy significantly impact business outcomes.
GitHub’s 2023 Octoverse Report indicates that 44% of developers work on projects requiring internationalization, underscoring the need for robust multilingual model evaluations.

Planned Actions:

  • Evaluate metrics like BLEU, TER, and COMET for translation workflows.

  • Prototype workflows for evaluating translation quality, focusing on accuracy, fluency, and cultural nuances.

  • Develop extensible modules for evaluating and comparing translation models, starting with smaller-scale tasks and scaling based on user feedback.

💡 Community Contribution Opportunities:

  • Help design evaluation metrics for translation.
  • Share real-world translation benchmarks.
  • Improve evaluation frameworks with multilingual support.

Timeline: Q1 2025 (availability to be defined, ideally EO Feb / early Mar)

@jularase jularase added the epic label Jan 15, 2025
@eu9ene
Copy link

eu9ene commented Jan 17, 2025

Hi, I'm Evgeny from the Firefox Translations team. There might be an overlap with one of our initiatives to experiment with LLMs as teacher models for knowledge distillation. The first step of this experiment would be to evaluate the translation capabilities of different models. Specifically, to see the effect on quality and cost of inference (for different sizes, pretrained vs fine-tuned on translations etc.). I did some benchmarking for a limited set of models and languages a while ago, but we need to go deeper this time. I know we're going to meet, but I'm also adding a link here for reference.

Github issue: mozilla/translations#994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants