22.06 (Mehefin / June 2022)
Read this release note in English
Dyma ein sgriptiau ym mis Mehefin 2022 (22.06) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail modelau wav2vec2 gan Facebook ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.
Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 9, a chyhoeddwyd ym mis Ebrill 2022, a data corpws testunau Cymraeg OSCAR o fis Ebrill 2022.
Ceir ffeiliau modelau ar wefan HuggingFace:
- https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.06
- https://huggingface.co/techiaith/wav2vec2-xls-r-1b-ft-cy/tree/22.06
Mewn arbrofion syml gyda set profi Common Voice, pan ddefnyddir y model acwsteg wav2vec2-xslr-ft-cy (~1Gb mewn maint) a model iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 13.74% o eiriau mewn brawddeg.
Mewn arbrofion syml gyda set profi Common Voice, pan ddefnyddir y model acwsteg wav2vec2-xsl-r-1b-ft-cy (~3Gb mewn maint) a model iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 12.38% o eiriau mewn brawddeg.
in English
Here are our June 2022 (22.06) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, as well as KenLM by Kenneth Heafield and others.
This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 9 as published in April 2022 and the Welsh text corpus dataset from OSCAR from April 2022.
Models can be found on the HuggingFace website:
- https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.06
- https://huggingface.co/techiaith/wav2vec2-xls-r-1b-ft-cy/tree/22.06
In simple evaluations on the Welsh Common Voice test set, the wav2vec2-xlsr-ft-cy acoustic models (size ~1Gb), when used together with a language model, exhibits a word error rate of 13.74%.
In simple evaluations on the Welsh Common Voice test set, the wav2vec2-xls-r-1b-ft-cy acoustic models (size ~3Gb), when used together with a language model, exhibits a word error rate of 12.38%.