| 注册
首页|期刊导航|计算机工程与应用|面向短文本的增强上下文神经主题模型

面向短文本的增强上下文神经主题模型

刘刚 王同礼 唐宏伟 战凯 杨雯莉

计算机工程与应用2024,Vol.60Issue(1):154-164,11.
计算机工程与应用2024,Vol.60Issue(1):154-164,11.DOI:10.3778/j.issn.1002-8331.2212-0259

面向短文本的增强上下文神经主题模型

Enhanced Contextual Neural Topic Model for Short Texts

刘刚 1王同礼 1唐宏伟 1战凯 2杨雯莉1

作者信息

  • 1. 哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001||哈尔滨工程大学 电子政务建模仿真国家工程实验室,哈尔滨 150001
  • 2. 澳大利亚普华永道公司 普华永道数字化部,悉尼 2070
  • 折叠

摘要

Abstract

Most of the current topic models are modeled based on word co-occurrence information of their own texts,and do not introduce topic sparsity constraints to improve the model's topic extraction ability.In addition,short texts have the problem of word co-occurrence sparsity,which seriously affects accuracy of short text topic modeling.In response to the above problems,an enhanced context neural topic model(ECNTM)is proposed.ECNTM implements sparsity constraints on the topic based on the topic controller to filter out irrelevant topics.At the same time,the input of the model becomes the splicing of BOW vector and SBERT sentence embedding.In the Gaussian decoder,the topic on the word is embedded in the embedding space.The distribution is treated as a multivariate Gaussian distribution or a Gaussian mixture distribution,which explicitly enriches the limited context information of short texts and solves the problem of sparse word co-occurrence features in short texts.Experimental results on four public datasets of WS,Reuters,KOS and 20 NewsGroups show that this model has significantly improved compared with the benchmark model in terms of perplexity,topic consistency,and text classification accuracy,which proves the introduction of topic sparsity constraints and rich effectiveness of con-textual information to short text topic modeling.

关键词

神经主题模型/短文本/稀疏约束/变分自编码器/主题建模

Key words

neural subject model/short text/sparsity constraint/variational auto-encoder/topic modeling

分类

信息技术与安全科学

引用本文复制引用

刘刚,王同礼,唐宏伟,战凯,杨雯莉..面向短文本的增强上下文神经主题模型[J].计算机工程与应用,2024,60(1):154-164,11.

基金项目

黑龙江省高等教育教学改革研究项目(SJGZ20200044) (SJGZ20200044)

黑龙江省自然科学基金(LH2021F015) (LH2021F015)

国家高端外国专家引进计划项目(G2021180008L). (G2021180008L)

计算机工程与应用

OA北大核心CSTPCD

1002-8331

访问量5
|
下载量0
段落导航相关论文