计算机科学与探索2024,Vol.18Issue(6):1613-1626,14.DOI:10.3778/j.issn.1673-9418.2302029
联合多模态与多跨度特征的嵌套命名实体识别
Nested Named Entity Recognition Combining Multi-modal and Multi-span Features
摘要
Abstract
Nested named entity recognition(NNER)has become a research hotspot in information extraction be-cause of its increasingly important practical significance.However,due to the shortage of corpus resources,limited exhaustive windows,missing span features,etc.,NNER research in vertical field has made slow progress and there are problems of entity recognition errors or omissions.To solve these problems,a vertical field NNER model based on mineralogy and corpus awareness dictionary is proposed.Firstly,the point mutual information,word frequency inverse text frequency algorithm and attention mechanism are combined to automatically integrate the corpus aware-ness dictionary,and the anchor text knowledge is used to improve the training accuracy of the model.Secondly,from the shared perspective,three multi-modal information fusion strategies are designed to train the encoder to learn the extended vector representation of character,glyph and vocabulary.Through triple product operation and slicing at-tention mechanism,the private representations captured by the multi-layer perceptron are screened and integrated to narrow the spatial gap of heterogeneous features.Thirdly,the context association between spans is determined by a bottom-up hierarchical architecture,and the proposed span set is generated.The characteristics of target span and ad-jacent span,target span internal characterization,target span boundary,etc.are obtained by double affine mechanism and linear classifier.Finally,the corresponding entity type label is assigned to the target span.Experimental results on six datasets show that compared with baseline model,the proposed method achieves significant performance im-provement and can effectively improve the NNER task effect in low-resource scenarios.关键词
嵌套命名实体识别/多模态/多任务/远程监督/矿物学Key words
nested named entity recognition/multi-modal/multi-task/distant supervision/mineralogy分类
信息技术与安全科学引用本文复制引用
邱云飞,邢浩然,于智龙,张文文..联合多模态与多跨度特征的嵌套命名实体识别[J].计算机科学与探索,2024,18(6):1613-1626,14.基金项目
国家自然科学基金(62173171) (62173171)
辽宁省自然科学基金(2015020095) (2015020095)
辽宁省教育厅科学技术研究项目(LJYL051) (LJYL051)
阜新市矿产资源编制项目(1920411).This work was supported by the National Natural Science Foundation of China(62173171),the Natural Science Foundation of Liaoning Province(2015020095),the Science and Technology Research Project of Liaoning Education Department(LJYL051),and the Fuxin City Mineral Resources Compilation Project(1920411). (1920411)