结合全局信息增强的医学领域命名实体识别研究OA北大核心CSTPCD
Research on Named Entity Recognition in Medical Domain with Global Information Augmentation
中文医疗问诊文本中,由于口语化的不规则表达和专业术语的频繁出现,药物名称等实体难以被精准地识别出来.为了充分利用中文句子词间关系的重要作用,提出了一种用于增强全局信息的医学命名实体识别模型.模型利用注意力机制增强了词嵌入表征,并在使用双向长短时记忆网络的序列处理能力获取上下文信息的基础上,同时从两个方面丰富了句子的全局信息表示.其一是根据句法关系获取词语之间额外依赖关系构建了图卷积网络层用于丰富词间的依赖;其二是构建了辅助任务用于预测词间句法依赖关系的类别.在中文医疗问诊数据集上的实验结果表明,模型具有很好的竞争力,F1值达到 94.54%.与其他模型相比,在药物和症状等实体类别的识别上取得了明显提高.在微博公开数据集上的实验也表明,模型具有通用领域的应用价值.
Entities such as drug names are difficult to identify accurately in Chinese medical questioning texts due to the frequent occurrence of colloquial irregular expressions and jargon.To make full use of the important role of inter-word relations in Chinese sentences,a medical named entity recognition model for enhancing global information is proposed.The model enhances the word embedding representation using an attention mechanism and enriches the global information representation of sentences in two ways simultaneously,based on the use of the sequence processing capability of bidirectional long and short-term memory networks to obtain contextual information.Firstly,a graphical convolutional network layer is constructed to enrich inter-word dependencies based on syntactic relationships to obtain additional dependencies between words;secondly,an auxiliary task is constructed to predict the class of syntactic dependencies between words.Experimental results on the Chinese medical consultation dataset show that the model is very competitive,with an F1 value of 94.54%.Significant improvements are achieved in the recognition of entity classes such as drugs and symptoms compared to other models.Experiments on the Weibo public dataset also show that the model has general-domain applications.
要媛媛;付潇;杨东瑛;王洁宁;郑文
太原理工大学计算机科学与技术学院(大数据学院),晋中 030600中国船舶集团有限公司综合技术经济研究院,北京 100081太原理工大学计算机科学与技术学院(大数据学院),晋中 030600||长治医学院山西省智能数据辅助诊疗工程研究中心,长治 046000
计算机与自动化
注意力机制双向长短时记忆网络图卷积网络医疗问诊命名实体识别
attention mechanismbidirectional long and short-term memory networkgraph convolutional networkmedical consultationnamed entity recognition
《电子科技大学学报》 2024 (003)
431-439 / 9
国家自然科学基金(11702289);山西省关键核心技术和共性技术研发攻关专项(2020XXX013)
评论