| 注册
首页|期刊导航|软件导刊|面向抽取式阅读理解的数据增强研究

面向抽取式阅读理解的数据增强研究

胡新荣 徐伟 罗瑞奇 刘军平 朱强 杨捷 李立军

软件导刊2024,Vol.23Issue(6):32-37,6.
软件导刊2024,Vol.23Issue(6):32-37,6.DOI:10.11907/rjdk.231137

面向抽取式阅读理解的数据增强研究

Research on Data Augmentation for Extractive Reading Comprehension

胡新荣 1徐伟 1罗瑞奇 1刘军平 1朱强 1杨捷 2李立军3

作者信息

  • 1. 武汉纺织大学 计算机与人工智能学院||湖北省服装信息化工程技术研究中心,湖北 武汉 430200
  • 2. 伍伦贡大学计算机与信息技术学院,伍伦贡 2522
  • 3. 宁波慈星股份有限公司,浙江 宁波 315000
  • 折叠

摘要

Abstract

In extractive reading comprehension,the performance of language model is poor in the case of less training data.Although EDA method can effectively increase the amount of data,it will cause the loss of semantic information in the data,resulting in poor training effect of the model.In response to the above problems,combined with EDA,a data augmentation method for extracting reading comprehension in the case of few samples is proposed.The data is enhanced at the word level and sentence level on the premise of retaining the correct answers to the questions in the data.At the same time,the data is enhanced for the single word with the least impact on sentence semantics,The data aug-mentation method based on semantic similarity(DASS)is used to calculate the semantic similarity of a word in a sentence before and after de-letion to determine the impact of the word on sentence semantics.The word with the least impact on semantics is selected for data enhancement to improve the quality of training data,so as to improve the robustness of the language model.The experimental results on HotpotQA show that DASS can solve the problem of insufficient semantic information when the number of samples is small.When the number of samples is 500,the F1 value predicted by the model increases by 23.94%.When this method is used for the whole dataset,the F1 value predicted by the mod-el increases by 2.54%.

关键词

抽取式阅读理解/EDA/数据增强/语义相似度/机器阅读理解

Key words

extractive reading comprehension/EDA/data augmentation/semantic similarity/machine reading comprehension

分类

信息技术与安全科学

引用本文复制引用

胡新荣,徐伟,罗瑞奇,刘军平,朱强,杨捷,李立军..面向抽取式阅读理解的数据增强研究[J].软件导刊,2024,23(6):32-37,6.

基金项目

湖北省重点研发计划项目(2020BAB116) (2020BAB116)

宁波市科技创新重大专项(2021Z069) (2021Z069)

软件导刊

1672-7800

访问量0
|
下载量0
段落导航相关论文