计算机工程2023,Vol.49Issue(12):71-77,7.DOI:10.19678/j.issn.1000-3428.0066379
基于位置嵌入和多级预测的中文嵌套命名实体识别
Chinese Nested Named Entity Recognition Based on Location Embedding and Multilevel Prediction
摘要
Abstract
Traditional Chinese nested Named Entity Recognition(NER)models often face problems,such as difficulty in accurately locating entity boundaries and blurred boundaries between Chinese characters and vocabulary.A nested NER model based on position embedding and multilevel result boundary prediction is proposed to address this problem.The position information of nested entities is encoded with the text position information in the embedding layer.An absolute position sequence is then generated,which further examines the relationship between the nested entities and characters and strengthens the connection between the nested entities and the original text by focusing on the position information in the Chinese text.At the encoding layer,the nested entities are initially identified using a hidden matrix that excludes the best path with multilevel prediction.At the decoding layer,the offset of entity boundaries is calculated at the multilevel prediction layer to redefine the entity boundaries,and improve the accuracy of Chinese entity prediction.The experimental results show that the proposed model improves the precision,recall,and F1-value by 0.34,1.06,and 0.80 percentage points,respectively,on the medical domain dataset,and by 11.90,0.78,and 6.23 percentage points,respectively,on the daily domain dataset compared to the highest value in the baseline models.This study demonstrates that the proposed model exhibits high performance in recognizing Chinese nested named entities.关键词
嵌套命名实体识别/位置嵌入/边界预测单元/条件随机场/多级预测Key words
nested Named Entity Recognition(NER)/location embedding/Boundary Prediction Unit(BPU)/Conditional Random Field(CRF)/multilevel prediction分类
信息技术与安全科学引用本文复制引用
段建勇,朱奕霏,王昊,何丽,李欣..基于位置嵌入和多级预测的中文嵌套命名实体识别[J].计算机工程,2023,49(12):71-77,7.基金项目
国家自然科学基金(61972003) (61972003)
教育部人文社科基金(21YJA740052) (21YJA740052)
北京市教育委员会科学研究计划项目(KM202210009002). (KM202210009002)