计算机技术与发展2018,Vol.28Issue(6):7-11,5.DOI:10.3969/j.issn.1673-629X.2018.06.002
基于词向量的特征词选择
Feature Words Selection Based on Word Embedding
摘要
Abstract
Protein-protein interaction information can help solve a lot of medical problems and is recorded in the medical literature. How-ever,the biomedical literature is increasing dramatically each year and collecting information manually has been difficult to meet the actu-al needs. In this paper,based on the protein interaction recognition using weak supervision,we propose a new method of word embed-ding. The method produces a vector for each word in feature words set in terms of word embedding and it translates comparison of simi-larity between words into comparison of similarity between vectors which words correspond to. Then,words are clustered,and the words that are more likely to express interactions are selected from the results of clustering to constitute new feature words set. It can make pro-tein-protein interaction recognition more efficient and precise. Clustering of word embedding can put the similar words into one category. It does not require the exact same words,which makes clustering result better. The experiment shows that using this method by a fifth of the feature words achieves better result than the case of not using feature words selection.关键词
蛋白质交互/词向量/聚类/特征词Key words
protein-protein interaction/word embedding/clustering/feature words分类
信息技术与安全科学引用本文复制引用
彭昀磊,牛耘..基于词向量的特征词选择[J].计算机技术与发展,2018,28(6):7-11,5.基金项目
国家自然科学基金(61202132) (61202132)