数字图书馆论坛2024,Vol.20Issue(12):56-65,10.DOI:10.3772/j.issn.1673-2286.2024.12.007
结合半监督学习和规则校正的中文学术论文问题实体识别研究
Problem Entity Recognition of Chinese Academic Paper Combining Semi-Supervised Learning and Rule Correction
摘要
Abstract
In order to quickly locate and identify research problems in academic papers,this paper proposes a problem entity recognition method combining semi-supervised learning and rule correction for Chinese academic papers.First,based on the framework of conditional random field model,supervised features such as parts of speech and deixis and unsupervised features such as similarity and importance are constructed.Then,the model recognition effects under different feature combinations are compared,and the recognition results are proofread according to domain linguistics rules.Finally,the subject areas of"sharing economy"and"ship construction"are taken as examples for empirical research.The entity recognition performance of the proposed method is better than that of the mainstream deep learning model and the pre-trained model such as the large language model.The F1 score of the two domain subject datasets reaches 85.82%and 86.38%,respectively,and the performance advantage on the 1/2 and 1/4 datasets is further expanded.It shows that the proposed method can identify the problem entities of Chinese academic papers well on small-scale labeled datasets in different fields,and shows good validity and robustness.关键词
问题实体识别/条件随机场/半监督学习/规则校正/小规模数据Key words
Problem Entity Recognition/Conditional Random Field/Semi-Supervised Learning/Rule Correction/Small Scale Data分类
信息技术与安全科学引用本文复制引用
傅柱,邱畅唱,刘鹏..结合半监督学习和规则校正的中文学术论文问题实体识别研究[J].数字图书馆论坛,2024,20(12):56-65,10.基金项目
本研究得到国家社会科学基金项目"面向AI4S的场景化智慧知识服务框架研究"(编号:24CTQ029)资助. (编号:24CTQ029)