山西大学学报(自然科学版)2024,Vol.47Issue(2):260-268,9.DOI:10.13451/j.sxu.ns.2023165
基于SoftLexicon和对抗训练的中文医疗命名实体识别
Chinese Medical Named Entity Recognition Based on Soft-Lexicon and Adversarial Training
摘要
Abstract
In existing medical entity recognition models,most of them cannot fully extract and utilize the lexical information in the text sequence,and their model structures are complex.This makes these models face problems such as inaccurate entity boundary recognition and poor robustness when dealing with medical texts.Additionally,most word-granularity based named entity recogni-tion(NER)methods are not perfect in solving the problem of information omission.To address these problems,a named entity rec-ognition model based on word fusion and adversarial training is proposed in this paper.The model uses a pre-trained model BERT to obtain word vectors of text sequences.Then the SoftLexicon is used to introduce lexical information and add perturbation samples generated by adversarial training to the word vectors.Finally,the BiLSTM-CRF is used to extract features and obtain sequence anno-tation results.The proposed model is experimented on the datasets CCKS2019 and CCKS2020,where the F1 values reach 85.07%and 90.39%,respectively.The experimental results show that compared with the baseline model,the F1 value of this model has in-creased by 2.31%and 2.88%,indicating that the combination of word fusion method and adversarial training can effectively identify medical entities.关键词
命名实体识别/字词融合/对抗训练/PGDKey words
named entity recognition/word and character fusion/adversarial training/Projected Gradient Descent(PGD)分类
计算机与自动化引用本文复制引用
潘世鹏,吐尔地·托合提,梁毅,艾斯卡尔·艾木都拉..基于SoftLexicon和对抗训练的中文医疗命名实体识别[J].山西大学学报(自然科学版),2024,47(2):260-268,9.基金项目
国家自然科学基金(62166042 ()
U2003207) ()
新疆维吾尔自治区自然科学基金(2021D01C076) (2021D01C076)
国防科技基金加强计划(2021-JCJQ-JJ-0059) (2021-JCJQ-JJ-0059)