| 注册
首页|期刊导航|计算机技术与发展|基于改进的BERTopic模型的政策文本主题挖掘

基于改进的BERTopic模型的政策文本主题挖掘

王雨琪 刘晨 刘建炜 蔡宏民

计算机技术与发展2025,Vol.35Issue(5):90-96,7.
计算机技术与发展2025,Vol.35Issue(5):90-96,7.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0410

基于改进的BERTopic模型的政策文本主题挖掘

Policy Text Topic Mining Based on Improved BERTopic Model

王雨琪 1刘晨 1刘建炜 2蔡宏民3

作者信息

  • 1. 大规模流数据集成与分析技术北京市重点实验室(北方工业大学),北京 100144
  • 2. 福建幼儿师范高等专科学校,福建 福州 350007
  • 3. 华南理工大学,广东 广州 510640
  • 折叠

摘要

Abstract

The application of natural language processing technology in text analysis has significantly improved the efficiency of extracting key information from massive data.Topic analysis methods based on natural language processing technology have also gained some success in the field of text analysis.However,due to the challenges of policy text data such as complex scenes,long text,and popularity bias effects,present topic mining approaches have a lot of opportunity for improvement.To address the aforementioned issues of policy text topic modeling,we propose a dynamic document embedding optimizer and a popularity bias regularization term based on the BERTopic approach.It respectively overcomes the lack of universality caused by the BERTopic model's ability to mine topics in fixed di-mensions and the high homogeneity of topic results caused by word-level popularity bias,and achieves automatic optimization of the optimal topic clustering vector dimensions and selection and effective correction of hot words.Through experimental analysis of policy texts,we found that the improved BERTopic is significantly better than the original BERTopic model and the state-of-the-art models in topic consistency,topic diversity,and comprehensive quality indicators.In the visualization results,the quality of the topics generated is also significantly better than that of the native model.

关键词

自然语言处理/主题模型/政策文本/BERTopic/流行度偏差

Key words

natural language processing/topic model/policy texts/BERTopic/popularity bias

分类

信息技术与安全科学

引用本文复制引用

王雨琪,刘晨,刘建炜,蔡宏民..基于改进的BERTopic模型的政策文本主题挖掘[J].计算机技术与发展,2025,35(5):90-96,7.

基金项目

广州市科技计划项目-重点研发计划(202206030009) (202206030009)

计算机技术与发展

1673-629X

访问量19
|
下载量0
段落导航相关论文