信息工程大学学报2024,Vol.25Issue(2):139-147,9.DOI:10.3969/j.issn.1671-0673.2024.02.003
基于多阶段训练的跨语言摘要技术
Cross-Lingual Summarization Technology Based on Multi-stage Training
摘要
Abstract
To solve the problem that the models of cross-lingual summarization(CLS)are poor in the semantic understanding,cross-lingual alignment and text generation,this paper proposes a CLS model based on the multi-stage training.Firstly,the model is trained by the multilingual denoising pre-training task,while learning common language knowledge in Chinese and English.Then,the model is trained by the multilingual machine translation task,simultaneously learning the following three types of abilities,semantic understanding of English,cross-lingual alignment from English to Chinese,and text generation of Chinese.Finally,the model is trained by the CLS task,further learning the above three types of abilities,eventually becoming an excellent English-to-Chinese CLS model.The experimental results show that the CLS performance of the proposed model is significantly improved,and the tasks of multilingual denoising pre-training and multilingual machine translation can both improve CLS performance.Experiments on an English-to-Chinese CLS benchmark dataset show that compared to the optimal performance in many baseline models,this model increases ROUGE-1,ROUGE-2 and ROUGE-L by 45.70%,60.53%and 43.57%,respectively.关键词
跨语言摘要/多阶段训练/多语言去噪预训练/多语言机器翻译Key words
cross-lingual summarization/multi-stage training/multilingual denoising pre-training/multilingual machine translation分类
信息技术与安全科学引用本文复制引用
潘航宇,席耀一,周会娟,陈刚,郭志刚..基于多阶段训练的跨语言摘要技术[J].信息工程大学学报,2024,25(2):139-147,9.基金项目
国家社会科学基金资助项目(19CXW027) (19CXW027)