| 注册
首页|期刊导航|计算机技术与发展|基于词向量的特征词选择

基于词向量的特征词选择

彭昀磊 牛耘

计算机技术与发展2018,Vol.28Issue(6):7-11,5.
计算机技术与发展2018,Vol.28Issue(6):7-11,5.DOI:10.3969/j.issn.1673-629X.2018.06.002

基于词向量的特征词选择

Feature Words Selection Based on Word Embedding

彭昀磊 1牛耘1

作者信息

  • 1. 南京航空航天大学 计算机科学与技术学院,江苏 南京 210016
  • 折叠

摘要

Abstract

Protein-protein interaction information can help solve a lot of medical problems and is recorded in the medical literature. How-ever,the biomedical literature is increasing dramatically each year and collecting information manually has been difficult to meet the actu-al needs. In this paper,based on the protein interaction recognition using weak supervision,we propose a new method of word embed-ding. The method produces a vector for each word in feature words set in terms of word embedding and it translates comparison of simi-larity between words into comparison of similarity between vectors which words correspond to. Then,words are clustered,and the words that are more likely to express interactions are selected from the results of clustering to constitute new feature words set. It can make pro-tein-protein interaction recognition more efficient and precise. Clustering of word embedding can put the similar words into one category. It does not require the exact same words,which makes clustering result better. The experiment shows that using this method by a fifth of the feature words achieves better result than the case of not using feature words selection.

关键词

蛋白质交互/词向量/聚类/特征词

Key words

protein-protein interaction/word embedding/clustering/feature words

分类

信息技术与安全科学

引用本文复制引用

彭昀磊,牛耘..基于词向量的特征词选择[J].计算机技术与发展,2018,28(6):7-11,5.

基金项目

国家自然科学基金(61202132) (61202132)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文