计算机应用研究2017,Vol.34Issue(1):118-122,5.DOI:10.3969/j.issn.1001-3695.2017.01.025
基于多标签CRF的疾病名称抽取
Multi-label CRF based method for disease extraction
摘要
Abstract
Named entity recognition in medical text for building and digging large clinical database to serve the clinical deci-sion is of great significance,and one of the important basic work is to be able to accurately identify the name of the disease. There are a large number of compound disease name in the medical texts.In order to solve this problem,this paper proposed a kind of CRF algorithm based on multi-label,first of all,it put multilayer labels to the data,labels on each floor for different diseases,and then integrated into an end label to training model,finally,it isolated each layer label from the model predicts result,and then identified the diseases.This method can recognize composite disease name which cannot be identified by the traditional CRF algorithm.The experimental results verify the effectiveness of the proposed algorithm.关键词
命名实体识别/条件随机场/多标签/医疗文本/复合实体Key words
named entity recognition/conditional random fields/multi-label/medical text/composite entity分类
信息技术与安全科学引用本文复制引用
王鹏远,姬东鸿..基于多标签CRF的疾病名称抽取[J].计算机应用研究,2017,34(1):118-122,5.基金项目
国家自然科学基金重点资助项目(61133012);国家哲学社会科学重大计划招标项目(11&ZD189);国家自然科学基金资助项目 ()