计算机技术与发展2016,Vol.26Issue(6):25-30,36,7.DOI:10.3969/j.issn.1673-629X.2016.06.006
基于LDA模型和多层聚类的微博话题检测
Microblog Topic Detection Based on LDA Model and Multi-level Clustering
摘要
Abstract
With the wide application of microblog,emerging social media,relevant research is being emerged on microblog. The topic de-tection based on microblog is one of the hotspots in current research. In combination with the relevant characteristics of microblog,a mi-croblog topic detection based on LDA model and hierarchical clustering is proposed. First,LDA model is applied for modeling and feature extraction to microblog data. Then,the improved Single-Pass clustering and hierarchical clustering is used on microblog data clustering and the hot topic is found. Experiment on large-scale corpus shows that it is more effective through the LDA model than by TF-IDF for feature selection and weight calculation;the improved Single-Pass clustering can deal with the untreated microblog by the first Single-Pass clustering,which can improve the accuracy of the initial clustering and reduce the time of hierarchical clustering;it is more effective through the hierarchical clustering than the single clustering in accuracy,recall and F -value. Clearly,it is feasible and effective by the LDA model and multi-level clustering to detect the microblog topic.关键词
LDA模型/话题检测/改进的Single-Pass聚类/层次聚类Key words
LDA model/topic detection/improved Single-Pass clustering/hierarchical clustering分类
信息技术与安全科学引用本文复制引用
刘红兵,李文坤,张仰森..基于LDA模型和多层聚类的微博话题检测[J].计算机技术与发展,2016,26(6):25-30,36,7.基金项目
国家自然科学基金资助项目(61370139) (61370139)
北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519) (IDHT20130519)
北京市教委专项基金(PXM2013014224000042,PXM2014014224000067) (PXM2013014224000042,PXM2014014224000067)