| 注册
首页|期刊导航|计算机工程|基于词嵌入与概率主题模型的社会媒体话题识别

基于词嵌入与概率主题模型的社会媒体话题识别

余冲 李晶 孙旭东 傅向华

计算机工程2017,Vol.43Issue(12):184-191,8.
计算机工程2017,Vol.43Issue(12):184-191,8.DOI:10.3969/j.issn.1000-3428.2017.12.034

基于词嵌入与概率主题模型的社会媒体话题识别

Social Media Topic Recognition Based on Word Embedding and Probabilistic Topic Model

余冲 1李晶 1孙旭东 1傅向华1

作者信息

  • 1. 深圳大学计算机与软件学院,广东深圳518000
  • 折叠

摘要

Abstract

Word embedding can capture the semantic information of words from the large corpus,and its combination with the probabilistic topic model can solve the problem of lack of semantic information in the standard topic model.So in this paper,Word-Topic Mixture (WTM) model is proposed to improve word representation and topic model simultaneously.Firstly,external corpus is introduced into the Topic Word Embedding (TWE) model to get the initial topic and word representation.Then the word embedding feature representation and topic vector are integrated in the topic model by redefining the probability conditional distribution of topic vectors and word embedding,meanwhile the KL divergence of the new word-topic distribution function and the original distribution function are minimized.Experimental results prove that the WTM model performs better on word representation and topic detection compared with Word2vec,TWE,Latent Dirichlet Allocation(LDA) and LFLDA model.

关键词

社会媒体/话题检测/特征表示/词嵌入/话题模型/词-主题混合模型

Key words

social media/topic detection/feature expression/word embedding/topic model/Word-Topic Mixture (WTM) model

分类

信息技术与安全科学

引用本文复制引用

余冲,李晶,孙旭东,傅向华..基于词嵌入与概率主题模型的社会媒体话题识别[J].计算机工程,2017,43(12):184-191,8.

基金项目

国家自然科学基金(61472258) (61472258)

深圳市基础研究计划项目(JCYJ20140509172609162). (JCYJ20140509172609162)

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文