计算机技术与发展2025,Vol.35Issue(11):138-144,7.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0169
面向低资源场景的藏英神经机器翻译研究
Research on Tibetan English Neural Machine Translation for Low Resource Scenarios
摘要
Abstract
With the development of deep learning technology,especially the emergence of Large Language Models(LLMs),Tibetan-Mandarin machine translation has made significant progress.However,in the field of Tibetan-Mandarin machine translation,the severe lack of Tibetan-Mandarin parallel corpora has seriously constrained the development of relevant research.To solve this problem,we propose a Tibetan-Mandarin neural translation framework suitable for low-resource scenarios.Specifically,we have first used a pivot language-based pseudo-parallel data generation method to successfully construct a 400 000-entry Tibetan-Mandarin parallel corpus.At the same time,For the issues such as model training instability,model insensitivity to low-frequency words,and poor translation results in low-resource scenarios,we improve the model from three aspects:pre-normalized residual connection,fixed word embedding,and nor-malization of the attention mechanism,thereby improving the model's performance in the low-resource Tibetan-English machine translation task.The experimental results show that compared with the traditional Transformer model,the proposed method improves the BLEU value by 1.68 and 2.25 on the validation set and the test set,respectively.关键词
低资源/藏英机器翻译/归一化/语料构建/模型优化Key words
low resources/Tibetan-English machine translation/normalization/corpus construction/model optimization分类
计算机与自动化引用本文复制引用
张佳亮,群诺,扎西平措,鲜昱恺,李嘉俊..面向低资源场景的藏英神经机器翻译研究[J].计算机技术与发展,2025,35(11):138-144,7.基金项目
新一代人工智能国家科技重大专项(2022ZD0116100) (2022ZD0116100)
国家自然基金青年基金(62406257,62406256) (62406257,62406256)
西藏自治区科技计划技术创新引导项目(XZ202501JX0004) (XZ202501JX0004)
西藏大学研究生"高水平人才培养计划"项目(2025-GSP-S137) (2025-GSP-S137)