计算机技术与发展2018,Vol.28Issue(2):19-23,5.DOI:10.3969/j.issn.1673-629X.2018.02.005
基于弱监督的蛋白质交互识别
Protein-protein Interaction Identification Based on Weak Supervision
摘要
Abstract
Protein-protein interaction information is the key to solve a lot of medical problems,and the information is recorded in the medical literature.With the increase of the biomedical literature,collecting information manually is difficult to meet the actual needs.For this,we pro-pose a method based on weak supervision to identify protein-protein interactions in the text.Firstly,this method generates vector representa-tion of protein interactions according to the text library.Moreover,it clusters instances according to the similarity of instances containing pro-teins and generates extraction patterns.Then,it finds new instances that meet the conditions as candidate instances from the text library ac-cording to extraction patterns.Lastly,it evaluates proteins that candidate instances correspond and adds proteins that meet the conditions to seed set.This method only needs a small amount of protein pairs as seeds,extending seed set through iterative algorithm,which can minimize the supervision and greatly reduce the manual intervention.The experiment shows that the method has achieved high precision and recall.关键词
蛋白质交互/弱监督/聚类/模式Key words
protein-protein interaction/weak supervision/clustering/pattern分类
信息技术与安全科学引用本文复制引用
彭昀磊,牛耘..基于弱监督的蛋白质交互识别[J].计算机技术与发展,2018,28(2):19-23,5.基金项目
国家自然科学基金(61202132) (61202132)