首页|期刊导航|计算机与现代化|面向藏汉神经机器翻译的数据筛选方法

面向藏汉神经机器翻译的数据筛选方法

仁青卓玛拥措唐超超

计算机与现代化Issue(6)：19-24,6.

计算机与现代化Issue(6)：19-24,6.DOI:10.3969/j.issn.1006-2475.2024.06.004

面向藏汉神经机器翻译的数据筛选方法

Data Filtering Strategies for Tibetan-Chinese Neural Machine Translation

仁青卓玛 ¹拥措 ¹唐超超¹

作者信息

1. 西藏大学信息科学技术学院,西藏拉萨 850000||西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨 850000||藏文信息技术教育部工程研究中心,西藏拉萨 850000
折叠

摘要

Abstract

Data syntax and semantic losses arise in Tibetan-Chinese machine translation when traditional data augmentation methods are employed.To address this issue,this paper proposes a pseudo-data filtering method combining sentence confusion degree with semantic similarity degree on the basis of traditional data enhancement methods.This strategy effectively tackles chal-lenges such as the inadequate quality and scarcity of parallel data,particularly in low-resource settings.The results of this study demonstrate that the pseudo data filtering approach significantly improves both Tibetan-Chinese and English-Chinese bidirec-tional language translation tasks.The proposed pseudo-data filtering method effectively improves the grammatical and semantic defects of the translation model,thus enhancing the performance of the translation system and the generalization ability of the translation model,and verifies the effectiveness of the proposed method.

关键词

回译/数据筛选/藏汉神经机器翻译/困惑度/语义相似度

Key words

back translation/data selection/Tibetan Chinese neural machine translation/perplexity/semantic similarity

分类

信息技术与安全科学

引用本文复制引用

仁青卓玛,拥措,唐超超..面向藏汉神经机器翻译的数据筛选方法[J].计算机与现代化,2024,(6):19-24,6.

基金项目

科技创新2030—"新一代人工智能"重大项目(2022ZD0116100) （2022ZD0116100）

西藏自治区科技创新基地自主研究项目(XZ2021JR0002G) （XZ2021JR0002G）

西藏大学学科建设能力提升计划项目(藏财预指[2023]1号) （藏财预指[2023]1号）

计算机与现代化

OACSTPCD

ISSN：1006-2475

访问量0

下载量0

段落导航