郑州大学学报(理学版)2026,Vol.58Issue(1):43-50,8.DOI:10.13705/j.issn.1671-6841.2024116
基于EALMDA的医疗命名实体识别数据增强方法
EALMDA:a Data Augmentation Method for Medical Named Entity Recognition
摘要
Abstract
Named entity recognition in the medical field involved identifying named entities from unstruc-tured medical texts.This played a crucial role in various downstream tasks.Due to the complexity of medical named entities leveraging domain-specific knowledge was required in expert annotations,which led to a severe scarcity of annotated data in the medical field.To address this issue,an entity-aware mask local mixup data augmentation method(EALMDA)was proposed.This method firstly extracted key ele-ments using an entity-aware masking channel and masked non-entity parts to retain core semantics.Then,masked sentences were fused through a linear combination of two sampling strategies:contextual entity similarity and k-nearest neighbors,which preserved the core semantics while increasing sample di-versity.Finally,after sequence linearization,the sentences were input into a generative model to obtain augmented samples.Comparative experiments were conducted on five mainstream medical named entity recognition datasets,such as NCBI-disease,simulating low-resource scenarios against mainstream data augmentation baselines,and significant improvements were observed compared to baseline methods.关键词
数据增强/命名实体识别/自然语言处理/生成模型/MixupKey words
data augmentation/named entity recognition/natural language processing/generative mod-el/Mixup分类
信息技术与安全科学引用本文复制引用
道路,刘纳,郑国风,李晨,杨杰..基于EALMDA的医疗命名实体识别数据增强方法[J].郑州大学学报(理学版),2026,58(1):43-50,8.基金项目
国家自然科学基金项目(62162001) (62162001)
宁夏自然科学基金项目(2021AAC03224) (2021AAC03224)
北方民族大学2024年度校级一般科研项目(2024XYZJK01) (2024XYZJK01)