计算机与数字工程2024,Vol.52Issue(12):3690-3696,7.DOI:10.3969/j.issn.1672-9722.2024.12.037
基于改进Transformer的长文本候选段落抽取
Long Text Candidate Paragraph Extraction Based on Improved Transformer
摘要
Abstract
In order to improve the performance of extracted machine reading comprehension,a long text candidate paragraph extraction model is built in the data preprocessing part to improve the quality of candidate answers.In the aspect of model word em-bedding,the N-tuple stroke information feature is added with position information to eliminate the ambiguity of cw2vec word vector model in learning Chinese stroke structure.In terms of model depth feature extraction,the self-Attention mechanism matrix in sparse Transformer is used to solve the problems of high computational complexity and long feature extraction time.Experiments on the DuReader data set show that the average accuracy and average ranking reciprocal index of the constructed paragraph extraction model reach 0.664 2 and 0.669 4 respectively.关键词
候选段落抽取/位置信息/Transformer/自注意力机制/稀疏矩阵Key words
candidate paragraph extraction/location information/Transformer/self-Attention/sparse matrix分类
信息技术与安全科学引用本文复制引用
任伟建,徐明明,康朝海,霍凤财,任璐,张永丰..基于改进Transformer的长文本候选段落抽取[J].计算机与数字工程,2024,52(12):3690-3696,7.基金项目
国家自然科学基金项目(编号:61933007,61873058)资助. (编号:61933007,61873058)