| 注册
首页|期刊导航|工程科学与技术|基于改进的ccLDA多数据源热点话题检测模型

基于改进的ccLDA多数据源热点话题检测模型

陈兴蜀 马晨曦 王文贤 高悦 王海舟

工程科学与技术2018,Vol.50Issue(2):141-147,7.
工程科学与技术2018,Vol.50Issue(2):141-147,7.DOI:10.15961/j.jsuese.201700626

基于改进的ccLDA多数据源热点话题检测模型

Multi-source Topic Detection Analysis Based on Improved ccLDA Model

陈兴蜀 1马晨曦 2王文贤 2高悦 3王海舟2

作者信息

  • 1. 四川大学 网络空间安全学院,四川 成都 610065
  • 2. 四川大学 计算机学院,四川 成都 610065
  • 3. 四川大学 网络空间安全研究院,四川 成都 610065
  • 折叠

摘要

Abstract

At present,ccLDA (cross collection LDA) model has been found only applicable to data sources that topic similarity is very high,and its global topics and local topics of each data source will be forced alignment,hence causing words sparse.In order to solve the problem of ccLDA model,an improved ccLDA topic model (IccLDA) was proposed.When sampling,this model firstly decides whether words are global topics or loc-al topics,and then takes samples respectively.In this way,it can avoid the problem that the global topics and local topics in ccLDA model must be aligned,and also can reduce the dispersion degree of the words in the global topics and local topics,making the model suitable for multiple data source scenarios.The topic discovery experiments of multiple data source were conducted on public data sets,and a comparative analysis of topics was conducted.The experimental results showed that the confusion degree of IccLDA model is lower than LDA model and ccLDA model,indicat-ing that IccLDA model has better modeling ability.Finally,further experimental verification was performed with the data sets of real-world scen-arios.The result showed that the improved model not only has better modeling ability than the traditional models,but also can effectively discover public topics discussed by various data sources and local topics discussed by each data source,and is more suitable for topic discovery in multiple data source scenarios.

关键词

话题检测/话题模型/LDA/多数据源/IccLDA

Key words

topic detection/topic model/LDA/multi-source/IccLDA

分类

信息技术与安全科学

引用本文复制引用

陈兴蜀,马晨曦,王文贤,高悦,王海舟..基于改进的ccLDA多数据源热点话题检测模型[J].工程科学与技术,2018,50(2):141-147,7.

基金项目

国家科技支撑计划资助项目(2012BAH18B05) (2012BAH18B05)

国家自然科学基金资助项目(61272447) (61272447)

四川省科技厅计划资助项目(16ZHSF0483) (16ZHSF0483)

工程科学与技术

OA北大核心CSCDCSTPCD

2096-3246

访问量0
|
下载量0
段落导航相关论文