山西大学学报(自然科学版)2025,Vol.48Issue(3):470-480,11.DOI:10.13451/j.sxu.ns.2024026
基于阅读理解的文档级人物属性抽取研究
Machine Reading Comprehension for Document-level Person Aspect Term Extraction
摘要
Abstract
Person aspect term extraction aims to extract various attributes of individuals such as gender and nationality from their de-scriptions.Existing extraction methods typically train sequence labeling models on distantly-supervised data to obtain the extraction model.However,this approach has issues with inaccurate annotations and overlapping different attribute values in the data,and lacks scalability and generalizability in their models.To solve the problems,this article proposes to transform this task into a ma-chine reading comprehension(MRC)problem,that is,to fill in the person attribute-value table by reading the person profile.This pa-per constructs a person attribute recognition data based on the reading comprehension framework from the person encyclopedia,and constructs two baseline models of bidirectional encoder representations from transformers-machine reading comprehension(BERT-MRC)and bidirectional encoder representations from transformers-conditional random field-machine reading comprehension(BERT-CRF-MRC).Among them,BERT-CRF-MRC is three percentage points higher than BERT-MRC on average in F1 score and the experimental results of BERT-CRF-MRC are about 92%F1 average in short text person profiles while about 75%in long text person profiles.The constructed data and code are exposed on Github.关键词
属性抽取/机器阅读理解/标注数据Key words
aspect term extraction/MRC/annotated data分类
计算机与自动化引用本文复制引用
刘资蕴,张世奇,陈文亮..基于阅读理解的文档级人物属性抽取研究[J].山西大学学报(自然科学版),2025,48(3):470-480,11.基金项目
国家自然科学基金(62376177) (62376177)