| 注册
首页|期刊导航|计算机工程与应用|基于余弦距离选取初始簇中心的文本聚类研究

基于余弦距离选取初始簇中心的文本聚类研究

王彬宇 刘文芬 胡学先 魏江宏

计算机工程与应用2018,Vol.54Issue(10):11-18,8.
计算机工程与应用2018,Vol.54Issue(10):11-18,8.DOI:10.3778/j.issn.1002-8331.1802-0108

基于余弦距离选取初始簇中心的文本聚类研究

Research on text clustering for selecting initial cluster center based on Cosine distance

王彬宇 1刘文芬 2胡学先 1魏江宏1

作者信息

  • 1. 数学工程与先进计算国家重点实验室,郑州450000
  • 2. 桂林电子科技大学 广西密码学与信息安全重点实验室,广西 桂林541000
  • 折叠

摘要

Abstract

Text clustering is an important means for text information to be organized,abstracted and navigated effectively, in which K-means algorithm based on cosine similarity is one of the most widely used algorithms.Aiming at the problem that the K-means algorithm based on cosine similarity is difficult to be improved,and that many excellent K-means improve-ment methods based on Euclidean distance can not be applied, the relationship between cosine similarity and Euclidean distance is discussed,and the transformation formula of the two is obtained with standard vector.Thus,a definition of cosine distance is given,which is close to the Euclidean distance,so that the original improved K-means method based on Euclidean distance can be transformed into a cosine similarity K-means algorithm by cosine distance.On this basis,it is deduced the calculation method of cluster center points in cosine K-means algorithm, and the initial point selection scheme is further improved to form a new text clustering algorithm MCSKM++.The experimental results show that the algorithm can improve the clustering accuracy while the number of iterations is reduced and the running time is shortened.

关键词

文本聚类/K-means算法/余弦相似度/余弦距离/初始点选取

Key words

text clustering/K-means algorithm/cosine similarity/cosine distance/initial point selection

分类

信息技术与安全科学

引用本文复制引用

王彬宇,刘文芬,胡学先,魏江宏..基于余弦距离选取初始簇中心的文本聚类研究[J].计算机工程与应用,2018,54(10):11-18,8.

基金项目

国家自然科学基金(No.61502527,No.61702549). (No.61502527,No.61702549)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量5
|
下载量0
段落导航相关论文