电子科技大学学报2024,Vol.53Issue(5):762-770,9.DOI:10.12178/1001-0548.2024173
低资源场景下基于联合训练与自训练的跨语言摘要方法
Cross-Lingual Summarization Method Based on Joint Training and Self-Training in Low-Resource Scenarios
摘要
Abstract
As globalization continues to develop,cross-lingual summarization has become an important topic in natural language processing.In low-resource scenarios,existing methods face challenges such as limited representation transfer and insufficient data utilization.To address these issues,this paper proposes a novel method based on joint training and self-training.Specifically,two models are used to handle the translation and cross-lingual summarization tasks,respectively,which unify the language vector space of the output and avoid the issue of limited representation transfer.Additionally,joint training is performed by aligning the output features and probabilities of parallel training pairs,thereby enhancing semantic sharing between the models.Furthermore,based on joint training,a self-training technique is introduced to generate synthetic data from additional monolingual summary data,effectively mitigating the data scarcity issue of low-resource scenarios.Experimental results demonstrate that this method outperforms existing approaches in multiple low-resource scenarios,achieving significant improvements in ROUGE scores.关键词
跨语言摘要/联合训练/低资源场景/机器翻译/自训练Key words
cross-lingual summarization/joint training/low-resource scenarios/machine translation/self-training分类
信息技术与安全科学引用本文复制引用
程绍欢,唐煜佳,刘峤,陈文宇..低资源场景下基于联合训练与自训练的跨语言摘要方法[J].电子科技大学学报,2024,53(5):762-770,9.基金项目
国家自然科学基金企业联合基金重点项目(U22B2061) (U22B2061)