| 注册
首页|期刊导航|计算机技术与发展|基于mRASP的藏汉双向神经机器翻译研究

基于mRASP的藏汉双向神经机器翻译研究

杨丹 拥措 仁青卓玛 唐超超

计算机技术与发展2023,Vol.33Issue(12):200-206,7.
计算机技术与发展2023,Vol.33Issue(12):200-206,7.DOI:10.3969/j.issn.1673-629X.2023.12.028

基于mRASP的藏汉双向神经机器翻译研究

Research on Tibetan-Chinese Bidirectional Neural Machine Translation Based on mRASP

杨丹 1拥措 1仁青卓玛 1唐超超1

作者信息

  • 1. 西藏大学 信息科学技术学院,西藏 拉萨 850000||西藏自治区藏文信息技术人工智能重点实验室,西藏 拉萨 850000||藏文信息技术教育部工程研究中心,西藏 拉萨 850000
  • 折叠

摘要

Abstract

The study of Tibetan-Chinese machine translation technology is of great practical significance to promote and inherit excellent national culture and advance the development of economy,education and culture in Tibetan areas.Based on the problem of poor Tibetan-Chinese neural machine translation caused by the lack of Tibetan-Chinese parallel corpus,we investigate the cross-linguistic pre-training model.We use the Tibetan-Chinese dataset from the 18th National Conference on Machine Translation(CCMT 2022)to construct the cross-lingual pre-training model(mRASP)for Tibetan-Chinese bilingualism,and adopt Google's Transformer neural network machine translation architecture as the baseline model,and mainly use data augmentation to expand the Tibetan-Chinese parallel corpus and optimize the vocabulary used in Tibetan-Chinese machine translation,and explore the influence of the joint vocabulary in the cross-language pre-training model on the translation performance.Finally,a Tibetan-Chinese bidirectional neural machine translation that integrates the cross-language pre-training model(mRASP)and the improved green joint vocabulary is proposed.Through the above strategies,the BLEU value on the Tibetan-Chinese translation task reached 55.69,and the BLEU value on the Chinese-Tibetan translation task reached29.57.Compared with the traditional Tibetan-Chinese bidirectional neural machine translation based on pre-trained model,it effectively improves the performance of Tibetan-Chinese bidirectional machine translation under the condition of scarce resources.

关键词

跨语言预训练模型/藏汉双向神经机器翻译/mRASP/数据增强/词表

Key words

cross-language pre-training model/Tibetan-Chinese bidirectional neural machine translation/mRASP/data

分类

信息技术与安全科学

引用本文复制引用

杨丹,拥措,仁青卓玛,唐超超..基于mRASP的藏汉双向神经机器翻译研究[J].计算机技术与发展,2023,33(12):200-206,7.

基金项目

国家重点研发计划项目(2017YFB1402202) (2017YFB1402202)

西藏自治区科技创新基地自主研究项目(XZ2021HR002G) (XZ2021HR002G)

西藏大学珠峰学科建设计划项目(zf22002001) (zf22002001)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文