中国电子科技2009,Vol.7Issue(3):227-231,5.
Enhanced Identifying Gene Names from Biomedical Literature with Conditional Random Fields
Enhanced Identifying Gene Names from Biomedical Literature with Conditional Random Fields
摘要
Abstract
Identifying gene names is an attractive research area of biology computing. However, accurate extraction of gene names is a challenging task with the lack of conventions for describing gene names. We devise a systematical architecture and apply the model using conditional random fields (CRFs) for extracting gene names from Medline. In order to improve the performance, biomedical ontology features are inserted into the model and post processing including boundary adjusting and word filter is presented to solve name overlapping problem and remove false positive single words. Pure string match method, baseline CRFs, and CRFs with our methods are applied to human gene names and HIV gene names extraction respectively in 1100 abstracts of Medline and their performances are contrasted. Results show that CRFs are robust for unseen gene names. Furthermore, CRFs with our methods outperforms other methods with precision 0.818 and recall 0.812.关键词
Conditional random fields/ gene name extraction/ information extraction/ named entity recognition.Key words
Conditional random fields/ gene name extraction/ information extraction/ named entity recognition.引用本文复制引用
Wei-Zhong Qian,Chong Fu,Hong-Rong Cheng,Qiao Liu,Zhi-Guang Qin..Enhanced Identifying Gene Names from Biomedical Literature with Conditional Random Fields[J].中国电子科技,2009,7(3):227-231,5.基金项目
This work was supported by China Scholarship Council under Grant No. 2007104897 and UESTC Youth Foundation under Grant No. JX05007. ()