计算机技术与发展2016,Vol.26Issue(8):7-11,5.DOI:10.3969/j.issn.1673-629X.2016.08.002
一种基于TextRank的文本二次聚类算法
A Secondary Text Clustering Algorithm Based on TextRank
摘要
Abstract
In view of the existing problems in the traditional text clustering technology,such as the general accuracy or the higher time complexity,two kinds of the commonly used text clustering technology are introduced at first,including K -means based on the division and LDA based on the theme. On the basis of the analysis of their respective defects,a secondary text clustering algorithm based on the TextRank is presented. Reference of idea of theme model,the algorithm introduces the word clustering in the process of traditional cluste-ring,and merges the futures of location and span in the keyword extraction phase,reducing the error by local keywords as global key-words. The experimental results show that the improved algorithm on the cluster effect is superior to the traditional VSM clustering and LDA algorithm based on the theme model.关键词
文本聚类/TextRank/提取/向量空间模型/LDAKey words
text clustering/TextRank/keyword extraction/VSM/LDA分类
信息技术与安全科学引用本文复制引用
潘晓英,胡开开,朱静..一种基于TextRank的文本二次聚类算法[J].计算机技术与发展,2016,26(8):7-11,5.基金项目
国家自然科学基金资助项目(61105064,61203311,61373116) (61105064,61203311,61373116)
陕西省教育专项科研计划(14JK1667) (14JK1667)
西安邮电大学研究生创新基金项目(CXL2014-23) (CXL2014-23)