| 注册
首页|期刊导航|山西大学学报(自然科学版)|基于阅读理解的文档级人物属性抽取研究

基于阅读理解的文档级人物属性抽取研究

刘资蕴 张世奇 陈文亮

山西大学学报(自然科学版)2025,Vol.48Issue(3):470-480,11.
山西大学学报(自然科学版)2025,Vol.48Issue(3):470-480,11.DOI:10.13451/j.sxu.ns.2024026

基于阅读理解的文档级人物属性抽取研究

Machine Reading Comprehension for Document-level Person Aspect Term Extraction

刘资蕴 1张世奇 1陈文亮1

作者信息

  • 1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
  • 折叠

摘要

Abstract

Person aspect term extraction aims to extract various attributes of individuals such as gender and nationality from their de-scriptions.Existing extraction methods typically train sequence labeling models on distantly-supervised data to obtain the extraction model.However,this approach has issues with inaccurate annotations and overlapping different attribute values in the data,and lacks scalability and generalizability in their models.To solve the problems,this article proposes to transform this task into a ma-chine reading comprehension(MRC)problem,that is,to fill in the person attribute-value table by reading the person profile.This pa-per constructs a person attribute recognition data based on the reading comprehension framework from the person encyclopedia,and constructs two baseline models of bidirectional encoder representations from transformers-machine reading comprehension(BERT-MRC)and bidirectional encoder representations from transformers-conditional random field-machine reading comprehension(BERT-CRF-MRC).Among them,BERT-CRF-MRC is three percentage points higher than BERT-MRC on average in F1 score and the experimental results of BERT-CRF-MRC are about 92%F1 average in short text person profiles while about 75%in long text person profiles.The constructed data and code are exposed on Github.

关键词

属性抽取/机器阅读理解/标注数据

Key words

aspect term extraction/MRC/annotated data

分类

计算机与自动化

引用本文复制引用

刘资蕴,张世奇,陈文亮..基于阅读理解的文档级人物属性抽取研究[J].山西大学学报(自然科学版),2025,48(3):470-480,11.

基金项目

国家自然科学基金(62376177) (62376177)

山西大学学报(自然科学版)

OA北大核心

0253-2395

访问量0
|
下载量0
段落导航相关论文