计算机与数字工程2017,Vol.45Issue(2):367-372,6.DOI:10.3969/j.issn.1672-9722.2017.02.032
一种具有降噪能力的概率主题模型
A Probabilistic Topic Model with Noise Reduction Ability
摘要
Abstract
With the arrival of big data era, recognizing and analyzing the hidden structure of text data efficiently has been more and more important.Powerful computational tools are needed to help understand text data better.Probabilistic topic models, especially the Latent Dirichlet Allocation (referred as LDA) model, have been proposed and applied in machine learning and text mining widely.Because the LDA model has very poor ability to distinguish similar topics, which has a bad influence on its practical performance.In order to solve this important problem, a new topic model named Noise Reduction Latent Dirichlet Allocation (referred as NRLDA) is proposed on the basis of LDA.There are a lot noise words making no contribution to discriminating similar topics, so this phenomenon is taken into consideration by introducing new variables to distinguish the different generative processes of noise words and non-noise words, which is absolutely beyond LDA's ability.Besides, a gibbs sampler is developed to infer NRLDA's parameters which is critical to investigating the structure of text corpus.Experimental results show that NRLDA model has a much stronger ability to differentiate similar topics, which proves that the idea in our model is reasonable.关键词
概率主题模型/隐含狄利克雷分布/吉布斯抽样/降噪Key words
probabilistic topic model/LDA/gibbs sampling/noise reduction分类
信息技术与安全科学引用本文复制引用
李晶,秦永彬,黄瑞章..一种具有降噪能力的概率主题模型[J].计算机与数字工程,2017,45(2):367-372,6.基金项目
国家自然科学基金项目(编号:61540050 ()
61462011) ()
贵州省重大应用基础研究项目(编号:黔科合JZ字[2014]2001) (编号:黔科合JZ字[2014]2001)
贵州省科技厅联合基金(编号:黔科合LH字[2014]7636号) (编号:黔科合LH字[2014]7636号)
贵州大学研究生创新基金项目(编号:研理工2016051)资助. (编号:研理工2016051)