电子学报2024,Vol.52Issue(12):3914-3930,17.DOI:10.12263/DZXB.20230429
融合引导注意力的中文长文本摘要生成
Chinese Long Text Summarization with Guided Attention
摘要
Abstract
Current research on Chinese long text summarization based on deep learning has the following problems:(1)summarization models lack information guidance,fail to focus on keywords and sentences,leading to the problem of losing critical information under long-distance span;(2)the word lists of existing Chinese long text summarization models are often word-based and do not contain common Chinese words and punctuation,which is not conducive to extracting multi-grained semantic information.To solve the above problems,a Chinese long text summarization method with guided attention(CLSGA)is proposed in this paper.Firstly,for the long text summarization task,an extraction model is presented to extract the core words and sentences in the long text to construct the guided text,which can guide the generation model to focus on more important information in the encoding process.Secondly,the Chinese long text vocabulary is designed to changing the text structure from words statistics to phrases statistics,which is conducive to extracting richer multi-granularity features.Hierarchical location decomposition encoding is then introduced to efficiently extend location encoding of long text and accelerate network convergence.Finally,the local attention mechanism is combined with the guided attention mechanism to effectively capture the important information under the long text span and improve the accuracy of summa-rization.Experimental results on four public Chinese abstract datasets with different lengths,LCSTS,CNewSum,NLPCC2017 and SFZY2020,show that our proposed method has significant advantages over long text summarization and can effectively improve the value of ROUGE-1,ROUGE-2 and ROUGE-L.关键词
自然语言处理/中文长文本摘要生成/引导注意力/层次位置分解编码/局部注意力Key words
natural language processing/Chinese long text summarization/guided attention/hierarchical location decomposition coding/local attention分类
信息技术与安全科学引用本文复制引用
郭哲,张智博,周炜杰,樊养余,张艳宁..融合引导注意力的中文长文本摘要生成[J].电子学报,2024,52(12):3914-3930,17.基金项目
国家自然科学基金(No.62071384) (No.62071384)
陕西省重点研发计划项目(No.2023-YBGY-239) National Natural Science Foundation of China(No.62071384) (No.2023-YBGY-239)
Key Research and Develop-ment Project of Shaanxi Province(No.2023-YBGY-239) (No.2023-YBGY-239)