计算机技术与发展2023,Vol.33Issue(12):200-206,7.DOI:10.3969/j.issn.1673-629X.2023.12.028
基于mRASP的藏汉双向神经机器翻译研究
Research on Tibetan-Chinese Bidirectional Neural Machine Translation Based on mRASP
摘要
Abstract
The study of Tibetan-Chinese machine translation technology is of great practical significance to promote and inherit excellent national culture and advance the development of economy,education and culture in Tibetan areas.Based on the problem of poor Tibetan-Chinese neural machine translation caused by the lack of Tibetan-Chinese parallel corpus,we investigate the cross-linguistic pre-training model.We use the Tibetan-Chinese dataset from the 18th National Conference on Machine Translation(CCMT 2022)to construct the cross-lingual pre-training model(mRASP)for Tibetan-Chinese bilingualism,and adopt Google's Transformer neural network machine translation architecture as the baseline model,and mainly use data augmentation to expand the Tibetan-Chinese parallel corpus and optimize the vocabulary used in Tibetan-Chinese machine translation,and explore the influence of the joint vocabulary in the cross-language pre-training model on the translation performance.Finally,a Tibetan-Chinese bidirectional neural machine translation that integrates the cross-language pre-training model(mRASP)and the improved green joint vocabulary is proposed.Through the above strategies,the BLEU value on the Tibetan-Chinese translation task reached 55.69,and the BLEU value on the Chinese-Tibetan translation task reached29.57.Compared with the traditional Tibetan-Chinese bidirectional neural machine translation based on pre-trained model,it effectively improves the performance of Tibetan-Chinese bidirectional machine translation under the condition of scarce resources.关键词
跨语言预训练模型/藏汉双向神经机器翻译/mRASP/数据增强/词表Key words
cross-language pre-training model/Tibetan-Chinese bidirectional neural machine translation/mRASP/data分类
信息技术与安全科学引用本文复制引用
杨丹,拥措,仁青卓玛,唐超超..基于mRASP的藏汉双向神经机器翻译研究[J].计算机技术与发展,2023,33(12):200-206,7.基金项目
国家重点研发计划项目(2017YFB1402202) (2017YFB1402202)
西藏自治区科技创新基地自主研究项目(XZ2021HR002G) (XZ2021HR002G)
西藏大学珠峰学科建设计划项目(zf22002001) (zf22002001)