计算机工程2017,Vol.43Issue(12):184-191,8.DOI:10.3969/j.issn.1000-3428.2017.12.034
基于词嵌入与概率主题模型的社会媒体话题识别
Social Media Topic Recognition Based on Word Embedding and Probabilistic Topic Model
摘要
Abstract
Word embedding can capture the semantic information of words from the large corpus,and its combination with the probabilistic topic model can solve the problem of lack of semantic information in the standard topic model.So in this paper,Word-Topic Mixture (WTM) model is proposed to improve word representation and topic model simultaneously.Firstly,external corpus is introduced into the Topic Word Embedding (TWE) model to get the initial topic and word representation.Then the word embedding feature representation and topic vector are integrated in the topic model by redefining the probability conditional distribution of topic vectors and word embedding,meanwhile the KL divergence of the new word-topic distribution function and the original distribution function are minimized.Experimental results prove that the WTM model performs better on word representation and topic detection compared with Word2vec,TWE,Latent Dirichlet Allocation(LDA) and LFLDA model.关键词
社会媒体/话题检测/特征表示/词嵌入/话题模型/词-主题混合模型Key words
social media/topic detection/feature expression/word embedding/topic model/Word-Topic Mixture (WTM) model分类
信息技术与安全科学引用本文复制引用
余冲,李晶,孙旭东,傅向华..基于词嵌入与概率主题模型的社会媒体话题识别[J].计算机工程,2017,43(12):184-191,8.基金项目
国家自然科学基金(61472258) (61472258)
深圳市基础研究计划项目(JCYJ20140509172609162). (JCYJ20140509172609162)