工程科学与技术2018,Vol.50Issue(2):141-147,7.DOI:10.15961/j.jsuese.201700626
基于改进的ccLDA多数据源热点话题检测模型
Multi-source Topic Detection Analysis Based on Improved ccLDA Model
摘要
Abstract
At present,ccLDA (cross collection LDA) model has been found only applicable to data sources that topic similarity is very high,and its global topics and local topics of each data source will be forced alignment,hence causing words sparse.In order to solve the problem of ccLDA model,an improved ccLDA topic model (IccLDA) was proposed.When sampling,this model firstly decides whether words are global topics or loc-al topics,and then takes samples respectively.In this way,it can avoid the problem that the global topics and local topics in ccLDA model must be aligned,and also can reduce the dispersion degree of the words in the global topics and local topics,making the model suitable for multiple data source scenarios.The topic discovery experiments of multiple data source were conducted on public data sets,and a comparative analysis of topics was conducted.The experimental results showed that the confusion degree of IccLDA model is lower than LDA model and ccLDA model,indicat-ing that IccLDA model has better modeling ability.Finally,further experimental verification was performed with the data sets of real-world scen-arios.The result showed that the improved model not only has better modeling ability than the traditional models,but also can effectively discover public topics discussed by various data sources and local topics discussed by each data source,and is more suitable for topic discovery in multiple data source scenarios.关键词
话题检测/话题模型/LDA/多数据源/IccLDAKey words
topic detection/topic model/LDA/multi-source/IccLDA分类
信息技术与安全科学引用本文复制引用
陈兴蜀,马晨曦,王文贤,高悦,王海舟..基于改进的ccLDA多数据源热点话题检测模型[J].工程科学与技术,2018,50(2):141-147,7.基金项目
国家科技支撑计划资助项目(2012BAH18B05) (2012BAH18B05)
国家自然科学基金资助项目(61272447) (61272447)
四川省科技厅计划资助项目(16ZHSF0483) (16ZHSF0483)