21.05 (Mai / May 2021)
Read this release note in English
Dyma ein sgriptiau ym mis Mai 2021 (21.05) ar gyfer hyfforddi, gwerthuso a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail wav2vec2 gan Facebook AI ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.
Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg, a chyhoeddwyd ym mis Rhagfyr 2020, a data corpws testunau Cymraeg OSCAR o fis Mai 2021.
- model acwstig techiaith_bangor_wav2vec2-xlsr-ft-cy.21.05.tar.gz
- model iaith (parth trawsgrifio) : techiaith_bangor_kenlm-cy_21.05.tar.gz
Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 15% o eiriau mewn brawddeg.
in English
Here are our May 2021 (21.05) scripts for training, evaluating and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, and KenLM by Kenneth Heafield and others.
This release also contains models trained with the Welsh dataset from Mozilla CommonVoice as published in December 2020 and the Welsh text corpus dataset from OSCAR from May 2021.
- acoustic model techiaith_bangor_wav2vec2-xlsr-ft-cy.21.05.tar.gz
- language model (transcription domain) : techiaith_bangor_kenlm-cy_21.05.tar.gz
In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 15%.