| 注册
首页|期刊导航|通信学报|基于单语优先级采样自训练神经机器翻译的研究

基于单语优先级采样自训练神经机器翻译的研究

张笑燕 逄磊 杜晓峰 陆天波 夏亚梅

通信学报2024,Vol.45Issue(4):65-72,8.
通信学报2024,Vol.45Issue(4):65-72,8.DOI:10.11959/j.issn.1000-436x.2024066

基于单语优先级采样自训练神经机器翻译的研究

Research on self-training neural machine translation based on monolingual priority sampling

张笑燕 1逄磊 1杜晓峰 1陆天波 1夏亚梅1

作者信息

  • 1. 北京邮电大学计算机学院(国家示范性软件学院),北京 100876
  • 折叠

摘要

Abstract

To enhance the performance of neural machine translation(NMT)and ameliorate the detrimental impact of high uncertainty in monolingual data during the self-training process,a self-training NMT model based on priority sam-pling was proposed.Initially,syntactic dependency trees were constructed and the importance of monolingual tokeniza-tion was assessed using grammar dependency analysis.Subsequently,a monolingual lexicon was built,and priority was defined based on the importance of monolingual tokenization and uncertainty.Finally,monolingual priorities were com-puted,and sampling was carried out based on these priorities,consequently generating a synthetic parallel dataset for training the student NMT model.Experimental results on a large-scale subset of the WMT English to German dataset demonstrate that the proposed model effectively enhances NMT translation performance and mitigates the impact of high uncertainty on the model.

关键词

机器翻译/数据增强/自训练/不确定性/语法依存

Key words

machine translation/data augmentation/self-training/uncertainty/syntactic dependency

分类

信息技术与安全科学

引用本文复制引用

张笑燕,逄磊,杜晓峰,陆天波,夏亚梅..基于单语优先级采样自训练神经机器翻译的研究[J].通信学报,2024,45(4):65-72,8.

基金项目

国家自然科学基金资助项目(No.62162060)The National Natural Science Foundation of China(No.62162060) (No.62162060)

通信学报

OA北大核心CSTPCD

1000-436X

访问量0
|
下载量0
段落导航相关论文