计算机工程与应用2024,Vol.60Issue(1):154-164,11.DOI:10.3778/j.issn.1002-8331.2212-0259
面向短文本的增强上下文神经主题模型
Enhanced Contextual Neural Topic Model for Short Texts
摘要
Abstract
Most of the current topic models are modeled based on word co-occurrence information of their own texts,and do not introduce topic sparsity constraints to improve the model's topic extraction ability.In addition,short texts have the problem of word co-occurrence sparsity,which seriously affects accuracy of short text topic modeling.In response to the above problems,an enhanced context neural topic model(ECNTM)is proposed.ECNTM implements sparsity constraints on the topic based on the topic controller to filter out irrelevant topics.At the same time,the input of the model becomes the splicing of BOW vector and SBERT sentence embedding.In the Gaussian decoder,the topic on the word is embedded in the embedding space.The distribution is treated as a multivariate Gaussian distribution or a Gaussian mixture distribution,which explicitly enriches the limited context information of short texts and solves the problem of sparse word co-occurrence features in short texts.Experimental results on four public datasets of WS,Reuters,KOS and 20 NewsGroups show that this model has significantly improved compared with the benchmark model in terms of perplexity,topic consistency,and text classification accuracy,which proves the introduction of topic sparsity constraints and rich effectiveness of con-textual information to short text topic modeling.关键词
神经主题模型/短文本/稀疏约束/变分自编码器/主题建模Key words
neural subject model/short text/sparsity constraint/variational auto-encoder/topic modeling分类
信息技术与安全科学引用本文复制引用
刘刚,王同礼,唐宏伟,战凯,杨雯莉..面向短文本的增强上下文神经主题模型[J].计算机工程与应用,2024,60(1):154-164,11.基金项目
黑龙江省高等教育教学改革研究项目(SJGZ20200044) (SJGZ20200044)
黑龙江省自然科学基金(LH2021F015) (LH2021F015)
国家高端外国专家引进计划项目(G2021180008L). (G2021180008L)