四川大学学报(自然科学版)2024,Vol.61Issue(4):104-112,9.DOI:10.19907/j.0490-6756.2024.042001
基于汉字上下文信息增强词典知识融入的中文命名实体识别
Chinese named entity recognition based on enhancing lexicon knowledge integration utilizing character context information
摘要
Abstract
Chinese named entity recognition(NER)is a challenging task due to the lack of explicit delimiters in the Chinese language,which leads to the absence of word boundary information.Existing mainstream mod-els address this issue by introducing lexicon for Chinese NER,which provides word boundary information.However,the word information contained in lexicon is fused into the character representations according to the matching relation between characters and words,without considering the impact of sentence information on word selection.The results in the introduction of irrelevant words that are unrelated to sentence semantics,leading the model to incorrectly perceive word boundary information.To reduce the impact of irrelevant words on entity recognition results,this paper proposes a novel Chinese NER method,called ELKI,which integrates lexicon knowledge with character-context representations that capture sentence semantic informa-tion,thereby improving the accuracy of word boundary perception.Specifically,a novel relation-aware character-word cross-attention network is designed to mine word representation that is related to the semantic information from the lexicon.Then,a gated fusion network is constructed to dynamically fuse the lexicon knowledge representation of each character with its context representation.The proposed model is evaluated on three benchmark datasets,Resume,MSRA and OntoNotes,and it outperforms other baseline models.关键词
中文命名实体识别/交叉注意力网络/门控融合网络/信息抽取Key words
Chinese named entity recognition/Cross-attention network/Gated fusion network/Information extraction分类
计算机与自动化引用本文复制引用
赵振宇,朱静静,张宇馨,刘梦珠,陈黎,琚生根..基于汉字上下文信息增强词典知识融入的中文命名实体识别[J].四川大学学报(自然科学版),2024,61(4):104-112,9.基金项目
国家自然科学基金重点项目(62137001) (62137001)
四川省重点研发项目(2023YFG0265) (2023YFG0265)