| 注册
首页|期刊导航|计算机与数字工程|一种具有降噪能力的概率主题模型

一种具有降噪能力的概率主题模型

李晶 秦永彬 黄瑞章

计算机与数字工程2017,Vol.45Issue(2):367-372,6.
计算机与数字工程2017,Vol.45Issue(2):367-372,6.DOI:10.3969/j.issn.1672-9722.2017.02.032

一种具有降噪能力的概率主题模型

A Probabilistic Topic Model with Noise Reduction Ability

李晶 1秦永彬 2黄瑞章1

作者信息

  • 1. 贵州省公共大数据重点实验室 贵阳 550025
  • 2. 贵州大学计算机科学与技术学院 贵阳 550025
  • 折叠

摘要

Abstract

With the arrival of big data era, recognizing and analyzing the hidden structure of text data efficiently has been more and more important.Powerful computational tools are needed to help understand text data better.Probabilistic topic models, especially the Latent Dirichlet Allocation (referred as LDA) model, have been proposed and applied in machine learning and text mining widely.Because the LDA model has very poor ability to distinguish similar topics, which has a bad influence on its practical performance.In order to solve this important problem, a new topic model named Noise Reduction Latent Dirichlet Allocation (referred as NRLDA) is proposed on the basis of LDA.There are a lot noise words making no contribution to discriminating similar topics, so this phenomenon is taken into consideration by introducing new variables to distinguish the different generative processes of noise words and non-noise words, which is absolutely beyond LDA's ability.Besides, a gibbs sampler is developed to infer NRLDA's parameters which is critical to investigating the structure of text corpus.Experimental results show that NRLDA model has a much stronger ability to differentiate similar topics, which proves that the idea in our model is reasonable.

关键词

概率主题模型/隐含狄利克雷分布/吉布斯抽样/降噪

Key words

probabilistic topic model/LDA/gibbs sampling/noise reduction

分类

信息技术与安全科学

引用本文复制引用

李晶,秦永彬,黄瑞章..一种具有降噪能力的概率主题模型[J].计算机与数字工程,2017,45(2):367-372,6.

基金项目

国家自然科学基金项目(编号:61540050 ()

61462011) ()

贵州省重大应用基础研究项目(编号:黔科合JZ字[2014]2001) (编号:黔科合JZ字[2014]2001)

贵州省科技厅联合基金(编号:黔科合LH字[2014]7636号) (编号:黔科合LH字[2014]7636号)

贵州大学研究生创新基金项目(编号:研理工2016051)资助. (编号:研理工2016051)

计算机与数字工程

OACSTPCD

1672-9722

访问量6
|
下载量0
段落导航相关论文