计算机工程2016,Vol.42Issue(11):195-201,7.DOI:10.3969/j.issn.1000-3428.2016.11.032
一种面向词汇突发的连续时间主题模型
A Continuous-time Topic Model for Word Burstiness
摘要
Abstract
To solve the problem that traditional topic models based on multinomial distribution cannot properly capture the condition of word burstiness,a continuous-time topic model with Dirichlet Compound Multinomial(DCM)for word burstiness is proposed,which integrates inherent temporal information in the corpus.In this model,the phenomenon of word burstiness is modeled by DCM distribution,while temporal features are characterized by Beta distribution.Gibbs sampling and fixed-point iteration method are employed to estimate the parameters in the model.Experimental results demonstrate that the model has obvious advantages over ToT and DCMLDA in terms of generalization performance when the given number of topics is small,and it can also effectively reveal the latent evolutions of topics in the corpus.关键词
主题模型/潜在 Dirichlet分配/词汇突发/Dirichlet组合多项式/Gibbs采样/不动点迭代法Key words
topic model/Latent Dirichlet Allocation (LDA )/word burstiness/Dirichlet Compound Multinomial (DCM)/Gibbs sampling/fixed-point iteration method分类
信息技术与安全科学引用本文复制引用
刘良选,黄梦醒..一种面向词汇突发的连续时间主题模型[J].计算机工程,2016,42(11):195-201,7.基金项目
国家自然科学基金(61462022)。 ()