This repo is the implementation of Mirror-Generative Neural Machine Translation.
MGNMT is implemented based on NJUNMT. As for data preprocessing and others please refer to README-NJUNMT
This repo has been tested with python 3.7
and pytorch 1.2.0
on 2080ti and V100.
We should prepare following datasets
- parallel bilingual training dataset with bpe codes, vocabs
- dev and test datasets
- (optional) non-paralell bilingual training dataset, i.e., monolingual dataset for source and target language
Please see configurations in configs/ed/
for example.
sh train_mgnmt_iwslt_de2en_ed.sh
This script uses configuration in configs/ed/iwslt_de2en_mgnmt.yaml
sh train_mgnmt_np_iwslt_de2en_ed.sh
This script uses configuration in configs/ed/iwslt_de2en_mgnmt_np.yaml
- for domain adaptation, using
NP_train_mode
as"finetune"
may be better - for other scenarios, using
"hybrid"
to enable hybrid training of both P and NP data - it would be better to start using NP data after the model has been well trained on parallel data, which should probably take a bit larger
NP_warmup_step
.
Specify the environment variable CUDA_VISIBLE_DEVICES
. For example:
export CUDA_VISIBLE_DEVICES=0,1,2,3
Currently this repo only support multi gpus on a single node.
sh translate_mgnmt.sh
Note that we have both src2tgt and tgt2src translation direction, so we should specify --src_lang
to either "src"
or "tgt"
to specify the expected translation direction.
-
Use
beta
to control the weight of decoding target LM fusion. -
Use
--reranking
to enable reconstructive reranking. -
Use
gamma
to control the weight of source LM in reconstructive reranking. -
--batch_size
determines batch size based on the number of tokens.
src/tasks/mgnmt.py
src/models/mgnmt.py
src/decoding/iterative_decoding.py
src/modules/variational_inferrer.py
and configs in ./configs/
-
How well the inference model learns affects the performance of MGNMT. So having a proper KL loss is important. However, KL annealing is VERY tricky. Try to adjust
x0
(KL annealing step) to find an optimal KL loss that is neither large nor small too much. A KL loss of ~3 to 6 works well in practice. -
For the best training efficiency, it is highly recommended that do not use reranking and LM fusion in back-translation or bleu_validation. Plus you may not know the optimal weights to interpolate LMs. You can find the best weights in inference after you get the training done.
- continue refactoring codebase
- pretrained models
- hyperparameters and settings for other experiments in the paper
@inproceedings{zheng2020mirror,
title={Mirror-Generative Neural Machine Translation},
author={Zaixiang Zheng and Hao Zhou and Shujian Huang and Lei Li and Xin-Yu Dai and Jiajun Chen},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=HkxQRTNYPH}
}