计算机与现代化Issue(8):78-80,84,4.DOI:10.3969/j.issn.1006-2475.2013.08.020
基于主题模型的K-均值文本聚类
Texts Clustering of K-means Based on LDA
郑诚 1李鸿1
作者信息
- 1. 安徽大学计算机科学与技术学院,安徽合肥230601
- 折叠
摘要
Abstract
The shortcoming of traditional vector space model expressing texts is the huge dimension.Usually one text is a huge sparse matrix,it is inefficient to compute distance or similarity between texts,and the result of clustering is not satisfactory.In latent dirichlet allocation(LDA),texts are represented by probability distribution of topic vectors,and topics are represented by probability distribution of words.In LDA,when the number of topics is assigned to T,all the texts to be clustering are represented by vectors of T dimension.K-means algorithm is utilized as a text clustering algorithm,and it is verified that LDA-based clustering results are better than that based on vector space model through experiment.关键词
主题模型/向量空间模型/文本聚类/K-均值算法Key words
LDA/ Vector Space Model(VSM) / text clustering/ K-means algorithm分类
信息技术与安全科学引用本文复制引用
郑诚,李鸿..基于主题模型的K-均值文本聚类[J].计算机与现代化,2013,(8):78-80,84,4.