| 注册
首页|期刊导航|计算机科学与探索|深度学习方法下的文本聚类模型研究进展

深度学习方法下的文本聚类模型研究进展

史东艳 马乐荣 丁苍峰 宁秦伟 曹江江

计算机科学与探索2025,Vol.19Issue(11):2873-2894,22.
计算机科学与探索2025,Vol.19Issue(11):2873-2894,22.DOI:10.3778/j.issn.1673-9418.2502024

深度学习方法下的文本聚类模型研究进展

Advances in Text Clustering Models Based on Deep Learning Approaches

史东艳 1马乐荣 1丁苍峰 1宁秦伟 1曹江江1

作者信息

  • 1. 延安大学 数学与计算机科学学院,陕西 延安 716000
  • 折叠

摘要

Abstract

Text clustering is one of the core techniques in unsupervised learning,aiming to automatically partition large text datasets into clusters with high semantic similarity.In recent years,deep learning-based text clustering has flourished,with research focus shifting towards utilizing advanced deep learning architectures to efficiently extract text features,thereby improving clustering accuracy.Particularly,clustering strategies relying on large pre-trained language models like RoBERTa and GPT have demonstrated exceptional performance due to their powerful pre-trained feature representations.Through examples and data,this paper comprehensively reviews the development,current progress,and task characteristics of text clustering,aiming to present its latest trends and significant impact in data mining.An innovative classification method for text clustering models based on deep learning architecture features is proposed.This classification method divides models based on their core mechanisms and feature extraction paths in clustering tasks,covering a comprehensive intro-duction to methods ranging from traditional clustering algorithms to advanced technologies,including K-means,spectral clustering,autoencoders,generative models,graph convolutional networks,and large language models,with detailed anal-ysis of their specific implementations.Finally,the advantages and limitations of existing methods are analyzed,and poten-tial future research directions are discussed.

关键词

特征表示/文本聚类/深度学习/大语言模型

Key words

feature representation/text clustering/deep learning/large language models

分类

计算机与自动化

引用本文复制引用

史东艳,马乐荣,丁苍峰,宁秦伟,曹江江..深度学习方法下的文本聚类模型研究进展[J].计算机科学与探索,2025,19(11):2873-2894,22.

基金项目

延安大学"十四五"中长期重大科研项目(2021ZCQ012) (2021ZCQ012)

延安大学产学研合作培育项目(CXY202107) (CXY202107)

陕西省特支计划人才项目(YAU202305399). This work was supported by the 14th Five-Year Plan Mid-long Term Major Research Program of Yan'an University(2021ZCQ012),the Industry-University-Research Cooperation Cultivation Program of Yan'an University(CXY202107),and the Shaanxi Provincial Special Support Talent Program(YAU202305399). (YAU202305399)

计算机科学与探索

OA北大核心

1673-9418

访问量0
|
下载量0
段落导航相关论文