You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Basic code structure for Encoder, Decoder, Model, DataPipeline, Tokenizer, Experiment, Metric, and Dataset.
(Model) Adds implementation of pre-norm/post-norm Transformer, Speech Transformer, BERT, GPT-2, and Wav2Vec2.0.
(Task) Adds implementation of sequence to sequence task and speech to text task (ASR, ST).
(DataPipeline, Tokenizer) Adds wrappers for commonly used tokenizers: moses, bpe, jieba, character, sentencepiece, etc.
(Dataset) Adds support for reading parallel corpus, speech corpora (libri-trans, MuST-C, and LibriSpeech), and TFRecords.
(Experiment) Adds implementation of common training procedure with mixed precision training and various distributed strategies (MirroredStrategy, Horovod, Byteps).
(Metric) Adds implementation of BLEU and WER metrics.
(Converter) Adds implementation of converting checkpoints from google BERT, OpenAI GPT-2, fairseq Transformer, and fairseq Wav2Vec2.0.
Beam search decoding and top-k/p sampling.
Supports averaging checkpoints, TFRecord generation, model restoring (see cli/README.md).
Step-by-step recipes for training an end-to-end speech translation model (see examples/speech_to_text).