计算机应用与软件Issue(1):105-107,139,4.DOI:10.3969/j.issn.1000-386x.2014.01.028
一种消除孤立点的微博热点话题发现方法
A MICROBLOGGING HOT TOPICS DISCOVERY METHOD BASED ON OUTLIERS ELIMINATION
摘要
Abstract
Microblogging has the characteristics of large number,fewer words and wide range of topics,these lead to quite a few isolated points (outliers)in microblogging data which have adverse effect on clustering algorithm of microblogging hot topics.Therefore,we propose a microblogging topic discovery method which is based on outliers elimination.First,the outliers are removed from dataset,and then the CURE algorithm is used to cluster those data remained and having clustering value,finally the validity of the algorithm is verified by examples. Results show that,compared with contrastive clustering algorithm,the proposed algorithm reduces the sensitivity of clustering result on outliers,improves the accuracy of microblogging hot topics discovery,and raises the operation efficiency of the algorithm,it is more suitable for applying in large-scale microblogging hot topics discovery.关键词
微博热点话题/孤立点/CURE算法/发现Key words
Microblogging hot topics/Outliers/CURE algorithm/Discovery分类
信息技术与安全科学引用本文复制引用
赖锦辉,梁松..一种消除孤立点的微博热点话题发现方法[J].计算机应用与软件,2014,(1):105-107,139,4.基金项目
国家自然科学基金项目(60903168);广东省教育部产学研结合项目(2010B090400235);茂名市科技计划项目(2011008)。 ()