| 注册
首页|期刊导航|计算机技术与发展|基于LDA模型和多层聚类的微博话题检测

基于LDA模型和多层聚类的微博话题检测

刘红兵 李文坤 张仰森

计算机技术与发展2016,Vol.26Issue(6):25-30,36,7.
计算机技术与发展2016,Vol.26Issue(6):25-30,36,7.DOI:10.3969/j.issn.1673-629X.2016.06.006

基于LDA模型和多层聚类的微博话题检测

Microblog Topic Detection Based on LDA Model and Multi-level Clustering

刘红兵 1李文坤 2张仰森2

作者信息

  • 1. 太原科技大学 电子信息学院,山西 太原 030024
  • 2. 北京信息科技大学 智能信息处理研究所,北京 100192
  • 折叠

摘要

Abstract

With the wide application of microblog,emerging social media,relevant research is being emerged on microblog. The topic de-tection based on microblog is one of the hotspots in current research. In combination with the relevant characteristics of microblog,a mi-croblog topic detection based on LDA model and hierarchical clustering is proposed. First,LDA model is applied for modeling and feature extraction to microblog data. Then,the improved Single-Pass clustering and hierarchical clustering is used on microblog data clustering and the hot topic is found. Experiment on large-scale corpus shows that it is more effective through the LDA model than by TF-IDF for feature selection and weight calculation;the improved Single-Pass clustering can deal with the untreated microblog by the first Single-Pass clustering,which can improve the accuracy of the initial clustering and reduce the time of hierarchical clustering;it is more effective through the hierarchical clustering than the single clustering in accuracy,recall and F -value. Clearly,it is feasible and effective by the LDA model and multi-level clustering to detect the microblog topic.

关键词

LDA模型/话题检测/改进的Single-Pass聚类/层次聚类

Key words

LDA model/topic detection/improved Single-Pass clustering/hierarchical clustering

分类

信息技术与安全科学

引用本文复制引用

刘红兵,李文坤,张仰森..基于LDA模型和多层聚类的微博话题检测[J].计算机技术与发展,2016,26(6):25-30,36,7.

基金项目

国家自然科学基金资助项目(61370139) (61370139)

北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519) (IDHT20130519)

北京市教委专项基金(PXM2013014224000042,PXM2014014224000067) (PXM2013014224000042,PXM2014014224000067)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文