| 注册
首页|期刊导航|计算机工程|一种基于词共现的文档聚类算法

一种基于词共现的文档聚类算法

常鹏 冯楠 马辉

计算机工程2012,Vol.38Issue(2):213-214,220,3.
计算机工程2012,Vol.38Issue(2):213-214,220,3.DOI:10.3969/j.issn.1000-3428.2012.02.070

一种基于词共现的文档聚类算法

Document Clustering Algorithm Based on Word Co-occurrence

常鹏 1冯楠 2马辉1

作者信息

  • 1. 天津大学管理与经济学部,天津300072
  • 2. 天津大学信息与网络中心,天津300072
  • 折叠

摘要

Abstract

This paper presents a document clustering algorithm based on word co-occurrence to solve the problem about information deletion of text subject expression. It uses the word co-occurrence of document set to establish the document theme vector presentation model, and applies to the hierarchical clustering algorithm, through the clustering entropy to find the best level partition, and accurately reflects the relationship between documents' theme. Experimental results show that the algorithm results is better than other phrases document hierarchical clustering algorithm.

关键词

文档聚类/文档模型/词共现/文档相似度/聚类增益

Key words

document clustering/document model/ word co-occurrence/ document similarity/ clustering gain

分类

信息技术与安全科学

引用本文复制引用

常鹏,冯楠,马辉..一种基于词共现的文档聚类算法[J].计算机工程,2012,38(2):213-214,220,3.

基金项目

国家自然科学基金资助项目(70901054) (70901054)

计算机工程

OACSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文