Skip to content

21.05 (Mai / May 2021)

Compare
Choose a tag to compare
@DewiBrynJones DewiBrynJones released this 09 Jun 05:59
· 21 commits to main since this release

Read this release note in English

Dyma ein sgriptiau ym mis Mai 2021 (21.05) ar gyfer hyfforddi, gwerthuso a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail wav2vec2 gan Facebook AI ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg, a chyhoeddwyd ym mis Rhagfyr 2020, a data corpws testunau Cymraeg OSCAR o fis Mai 2021.

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 15% o eiriau mewn brawddeg.


in English

Here are our May 2021 (21.05) scripts for training, evaluating and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, and KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice as published in December 2020 and the Welsh text corpus dataset from OSCAR from May 2021.

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 15%.