| 注册
首页|期刊导航|计算机与现代化|基于主题模型的K-均值文本聚类

基于主题模型的K-均值文本聚类

郑诚 李鸿

计算机与现代化Issue(8):78-80,84,4.
计算机与现代化Issue(8):78-80,84,4.DOI:10.3969/j.issn.1006-2475.2013.08.020

基于主题模型的K-均值文本聚类

Texts Clustering of K-means Based on LDA

郑诚 1李鸿1

作者信息

  • 1. 安徽大学计算机科学与技术学院,安徽合肥230601
  • 折叠

摘要

Abstract

The shortcoming of traditional vector space model expressing texts is the huge dimension.Usually one text is a huge sparse matrix,it is inefficient to compute distance or similarity between texts,and the result of clustering is not satisfactory.In latent dirichlet allocation(LDA),texts are represented by probability distribution of topic vectors,and topics are represented by probability distribution of words.In LDA,when the number of topics is assigned to T,all the texts to be clustering are represented by vectors of T dimension.K-means algorithm is utilized as a text clustering algorithm,and it is verified that LDA-based clustering results are better than that based on vector space model through experiment.

关键词

主题模型/向量空间模型/文本聚类/K-均值算法

Key words

LDA/ Vector Space Model(VSM) / text clustering/ K-means algorithm

分类

信息技术与安全科学

引用本文复制引用

郑诚,李鸿..基于主题模型的K-均值文本聚类[J].计算机与现代化,2013,(8):78-80,84,4.

计算机与现代化

OACSTPCD

1006-2475

访问量0
|
下载量0
段落导航相关论文