首页|期刊导航|计算机技术与发展|一种基于TextRank的文本二次聚类算法

一种基于TextRank的文本二次聚类算法

潘晓英胡开开朱静

计算机技术与发展2016，Vol.26Issue(8)：7-11,5.

计算机技术与发展2016，Vol.26Issue(8)：7-11,5.DOI:10.3969/j.issn.1673-629X.2016.08.002

一种基于TextRank的文本二次聚类算法

A Secondary Text Clustering Algorithm Based on TextRank

潘晓英 ¹胡开开 ¹朱静¹

作者信息

1. 西安邮电大学计算机学院，陕西西安 710121
折叠

摘要

Abstract

In view of the existing problems in the traditional text clustering technology,such as the general accuracy or the higher time complexity,two kinds of the commonly used text clustering technology are introduced at first,including K -means based on the division and LDA based on the theme. On the basis of the analysis of their respective defects,a secondary text clustering algorithm based on the TextRank is presented. Reference of idea of theme model,the algorithm introduces the word clustering in the process of traditional cluste-ring,and merges the futures of location and span in the keyword extraction phase,reducing the error by local keywords as global key-words. The experimental results show that the improved algorithm on the cluster effect is superior to the traditional VSM clustering and LDA algorithm based on the theme model.

关键词

文本聚类/TextRank/提取/向量空间模型/LDA

Key words

text clustering/TextRank/keyword extraction/VSM/LDA

分类

信息技术与安全科学

引用本文复制引用

潘晓英,胡开开,朱静..一种基于TextRank的文本二次聚类算法[J].计算机技术与发展,2016,26(8):7-11,5.

基金项目

国家自然科学基金资助项目(61105064,61203311,61373116) （61105064,61203311,61373116）

陕西省教育专项科研计划(14JK1667) （14JK1667）

西安邮电大学研究生创新基金项目(CXL2014-23) （CXL2014-23）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航