| 注册
首页|期刊导航|计算机与数字工程|基于改进Transformer的长文本候选段落抽取

基于改进Transformer的长文本候选段落抽取

任伟建 徐明明 康朝海 霍凤财 任璐 张永丰

计算机与数字工程2024,Vol.52Issue(12):3690-3696,7.
计算机与数字工程2024,Vol.52Issue(12):3690-3696,7.DOI:10.3969/j.issn.1672-9722.2024.12.037

基于改进Transformer的长文本候选段落抽取

Long Text Candidate Paragraph Extraction Based on Improved Transformer

任伟建 1徐明明 2康朝海 1霍凤财 1任璐 3张永丰4

作者信息

  • 1. 东北石油大学电气信息工程学院 大庆 163318||黑龙江省网络化与智能控制重点实验室 大庆 163318
  • 2. 东北石油大学电气信息工程学院 大庆 163318
  • 3. 海洋石油工程股份有限公司 天津 300450
  • 4. 大庆油田有限责任公司第二采油厂规划设计研究所 大庆 163318
  • 折叠

摘要

Abstract

In order to improve the performance of extracted machine reading comprehension,a long text candidate paragraph extraction model is built in the data preprocessing part to improve the quality of candidate answers.In the aspect of model word em-bedding,the N-tuple stroke information feature is added with position information to eliminate the ambiguity of cw2vec word vector model in learning Chinese stroke structure.In terms of model depth feature extraction,the self-Attention mechanism matrix in sparse Transformer is used to solve the problems of high computational complexity and long feature extraction time.Experiments on the DuReader data set show that the average accuracy and average ranking reciprocal index of the constructed paragraph extraction model reach 0.664 2 and 0.669 4 respectively.

关键词

候选段落抽取/位置信息/Transformer/自注意力机制/稀疏矩阵

Key words

candidate paragraph extraction/location information/Transformer/self-Attention/sparse matrix

分类

信息技术与安全科学

引用本文复制引用

任伟建,徐明明,康朝海,霍凤财,任璐,张永丰..基于改进Transformer的长文本候选段落抽取[J].计算机与数字工程,2024,52(12):3690-3696,7.

基金项目

国家自然科学基金项目(编号:61933007,61873058)资助. (编号:61933007,61873058)

计算机与数字工程

OACSTPCD

1672-9722

访问量0
|
下载量0
段落导航相关论文