计算机与现代化Issue(4):17-21,5.DOI:10.3969/j.issn.1006-2475.2018.04.004
基于改进k-means算法的文本聚类
Text Clustering Based on Improved k-means Algorithm
蒋丽 1薛善良1
作者信息
- 1. 南京航空航天大学计算机科学与技术学院,江苏 南京211106
- 折叠
摘要
Abstract
To solve the problem that the original k-means algorithm is sensitive to the clustering number k,an improved k-means algorithm is proposed.The algorithm is designed to firstly calculate the similarity between word vectors based on the principle of co-occurrence words and divides the data into k +x clusters according to the similarity threshold and then uses k-means algorithm for k+x clusters.The proposed algorithm is applied to the text clustering.The experimental results show that the proposed algo-rithm is more accurate than the original algorithm.关键词
k-means算法/共现词/词向量/相似性Key words
k-means algorithm/co-occurrence word/word vector/similarity分类
信息技术与安全科学引用本文复制引用
蒋丽,薛善良..基于改进k-means算法的文本聚类[J].计算机与现代化,2018,(4):17-21,5.