| 注册
首页|期刊导航|计算机工程|基于多粒度字形增强的中文医学命名实体识别

基于多粒度字形增强的中文医学命名实体识别

刘威 马磊 李凯 李蓉

计算机工程2024,Vol.50Issue(2):337-344,8.
计算机工程2024,Vol.50Issue(2):337-344,8.DOI:10.19678/j.issn.1000-3428.0067285

基于多粒度字形增强的中文医学命名实体识别

Chinese Medical Named Entity Recognition Based on Multi-Granularity Glyph Enhancement

刘威 1马磊 1李凯 2李蓉3

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,云南 昆明 650500
  • 2. 云南省第一人民医院信息科,云南 昆明 650500
  • 3. 云南省第一人民医院科研科,云南 昆明 650500
  • 折叠

摘要

Abstract

Chinese Medical Named Entity Recognition(CMNER)focuses on extracting entities from unstructured Chinese medical texts.Current character-based CMNER models inadequately address the distinct features of Chinese characters from various angles,thereby limiting their efficacy in CMNER applications.To address this,a model leveraging multigranular glyph information enhancement for Chinese medical named entity recognition is introduced.This model integrates the glyph spatial structure and radical representation of Chinese characters,aligning them with domain-specific lexicon-based word information.This approach enriches the semantic and boundary potential of characters.Through a gating mechanism,the model effectively combines domain-specific terms with the multifaceted glyph features of Chinese characters,ensuring comprehensive consideration of both domain relevance and intrinsic character details,thereby enhancing its capacity for medical entity recognition.The model employs multigranular glyph-enhanced character representations in the Bidirectional Long Short-Term Memory(BiLSTM)and Conditional Random Field(CRF)layers for contextual encoding and label decoding,respectively.Experimental results demonstrate that the proposed model surpasses the best baseline model,achieving an increase in F1 scores of 1.04% and 0.62% on the IMCS21 and CMeEE datasets,respectively.Ablation studies further confirm the efficacy of each component,highlighting the model's superiority in recognizing Chinese medical named entities.

关键词

命名实体识别/医学领域/字形结构/门控机制/领域词典

Key words

named entity recognition/medical domain/glyph structure/gating mechanism/domain lexicon

分类

信息技术与安全科学

引用本文复制引用

刘威,马磊,李凯,李蓉..基于多粒度字形增强的中文医学命名实体识别[J].计算机工程,2024,50(2):337-344,8.

基金项目

国家自然科学基金(62266025) (62266025)

云南省重大科技专项计划项目(202202AD080004,202202AE090008) (202202AD080004,202202AE090008)

云南省基础研究计划(昆医联合专项)(202201AY070001-258). (昆医联合专项)

计算机工程

OA北大核心CSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文