| 注册
首页|期刊导航|四川大学学报(自然科学版)|基于汉字上下文信息增强词典知识融入的中文命名实体识别

基于汉字上下文信息增强词典知识融入的中文命名实体识别

赵振宇 朱静静 张宇馨 刘梦珠 陈黎 琚生根

四川大学学报(自然科学版)2024,Vol.61Issue(4):104-112,9.
四川大学学报(自然科学版)2024,Vol.61Issue(4):104-112,9.DOI:10.19907/j.0490-6756.2024.042001

基于汉字上下文信息增强词典知识融入的中文命名实体识别

Chinese named entity recognition based on enhancing lexicon knowledge integration utilizing character context information

赵振宇 1朱静静 1张宇馨 1刘梦珠 2陈黎 1琚生根1

作者信息

  • 1. 四川大学计算机学院,成都 610065
  • 2. 贵州商学院计算机与信息工程学院,贵阳 550014
  • 折叠

摘要

Abstract

Chinese named entity recognition(NER)is a challenging task due to the lack of explicit delimiters in the Chinese language,which leads to the absence of word boundary information.Existing mainstream mod-els address this issue by introducing lexicon for Chinese NER,which provides word boundary information.However,the word information contained in lexicon is fused into the character representations according to the matching relation between characters and words,without considering the impact of sentence information on word selection.The results in the introduction of irrelevant words that are unrelated to sentence semantics,leading the model to incorrectly perceive word boundary information.To reduce the impact of irrelevant words on entity recognition results,this paper proposes a novel Chinese NER method,called ELKI,which integrates lexicon knowledge with character-context representations that capture sentence semantic informa-tion,thereby improving the accuracy of word boundary perception.Specifically,a novel relation-aware character-word cross-attention network is designed to mine word representation that is related to the semantic information from the lexicon.Then,a gated fusion network is constructed to dynamically fuse the lexicon knowledge representation of each character with its context representation.The proposed model is evaluated on three benchmark datasets,Resume,MSRA and OntoNotes,and it outperforms other baseline models.

关键词

中文命名实体识别/交叉注意力网络/门控融合网络/信息抽取

Key words

Chinese named entity recognition/Cross-attention network/Gated fusion network/Information extraction

分类

计算机与自动化

引用本文复制引用

赵振宇,朱静静,张宇馨,刘梦珠,陈黎,琚生根..基于汉字上下文信息增强词典知识融入的中文命名实体识别[J].四川大学学报(自然科学版),2024,61(4):104-112,9.

基金项目

国家自然科学基金重点项目(62137001) (62137001)

四川省重点研发项目(2023YFG0265) (2023YFG0265)

四川大学学报(自然科学版)

OA北大核心CSTPCD

0490-6756

访问量2
|
下载量0
段落导航相关论文