摘要
Abstract
Unsupervised methods,which strives to alleviate the impact of the scarcity of large parallel corpora on the quality of machine translation,have attracted much attention in the field of neural machine translation.However,their translation performances in distant language pairs still need to be improved.Therefore,the translation language model(TLM)is introduced and the Dict-TLM method is proposed.The core idea of this method is to train language models by combining monolingual corpora and unsupervised bilingual dictionaries.Specifically,the model accepts source language sentences and takes them as the input first,and then,unlike the traditional TLM that only accepts parallel corpora,the Dict-TLM model even accepts data from source language sentences processed by unsupervised bilingual dictionaries and takes them as the input.In this input,the proposed model replaces the words that appear in the bilingual dictionary in the source language sentence with the corresponding target language translation words.Importantly,the bilingual dictionary is obtained in an unsupervised manner.The experiment shows that the Dict-TLM improves the BLEU score by 3%in comparison with the traditional unsupervised machine translation in Chinese English language pairs.关键词
无监督神经机器翻译/远距离语言对/预训练/TLM/双语词典/双语词嵌入Key words
unsupervised neural machine translation/distant language pairs/pre-training/TLM/bilingual dictionary/bilingual word embedding分类
信息技术与安全科学