计算机技术与发展2025,Vol.35Issue(5):90-96,7.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0410
基于改进的BERTopic模型的政策文本主题挖掘
Policy Text Topic Mining Based on Improved BERTopic Model
摘要
Abstract
The application of natural language processing technology in text analysis has significantly improved the efficiency of extracting key information from massive data.Topic analysis methods based on natural language processing technology have also gained some success in the field of text analysis.However,due to the challenges of policy text data such as complex scenes,long text,and popularity bias effects,present topic mining approaches have a lot of opportunity for improvement.To address the aforementioned issues of policy text topic modeling,we propose a dynamic document embedding optimizer and a popularity bias regularization term based on the BERTopic approach.It respectively overcomes the lack of universality caused by the BERTopic model's ability to mine topics in fixed di-mensions and the high homogeneity of topic results caused by word-level popularity bias,and achieves automatic optimization of the optimal topic clustering vector dimensions and selection and effective correction of hot words.Through experimental analysis of policy texts,we found that the improved BERTopic is significantly better than the original BERTopic model and the state-of-the-art models in topic consistency,topic diversity,and comprehensive quality indicators.In the visualization results,the quality of the topics generated is also significantly better than that of the native model.关键词
自然语言处理/主题模型/政策文本/BERTopic/流行度偏差Key words
natural language processing/topic model/policy texts/BERTopic/popularity bias分类
信息技术与安全科学引用本文复制引用
王雨琪,刘晨,刘建炜,蔡宏民..基于改进的BERTopic模型的政策文本主题挖掘[J].计算机技术与发展,2025,35(5):90-96,7.基金项目
广州市科技计划项目-重点研发计划(202206030009) (202206030009)