计算机技术与发展2019,Vol.29Issue(3):18-22,5.DOI:10.3969/j.issn.1673-629X.2019.03.004
基于关键词的蛋白质交互关系识别
Protein-protein Interaction Identification Based on Keywords
摘要
Abstract
Protein-protein interaction is one of the important research areas in the field of biomedicine. The relevant PPI information currently available through biomedical experiments is mainly stored in texts in the relevant literature. With the rapid growth of biomedical literature, the way of manually identifying PPI has been difficult to meet the needs of practical applications. In this paper, we adopt a weak supervision based PPI recognition infrastructure. With a small number of pairs of proteins as an interactive set of seeds, PPI is eventually identified through continuous iteration expansion of the seed set. Compared with other existing methods, this method only needs a small amount of labeled data to achieve great recognition results, which saves a lot of manpower and resources. On this basis, we use the word embedding to expand the existing key words that express PPI and score the reliability of the keywords. According to the expanded set of keywords, the clustering process of the basic framework is improved, and the set of input lexical patterns of clustering is sorted in descending order according to the included keyword scores. The experiment shows that the basic PPI recognition framework achieves better results with only a small amount of labeled data. On this basis, the improved keyword expansion algorithm further improves the results. The highest F-score after the first iteration is 67.20%, 1.54% higher than that before the improvement, and the F-score after three iterations is 69.05%.关键词
蛋白质交互关系/弱监督/分布式假设/词向量/关键词Key words
protein-protein interaction/w eak supervision/distributional hypothesis/word embedding/keywords分类
信息技术与安全科学引用本文复制引用
毛宇薇,牛耘..基于关键词的蛋白质交互关系识别[J].计算机技术与发展,2019,29(3):18-22,5.基金项目
国家自然科学基金(61202132) (61202132)