| 注册
首页|期刊导航|计算机与数字工程|融合密度和划分的文本聚类算法

融合密度和划分的文本聚类算法

刘龙 刘新 蔡林杰 唐朝

计算机与数字工程2024,Vol.52Issue(1):178-183,6.
计算机与数字工程2024,Vol.52Issue(1):178-183,6.DOI:10.3969/j.issn.1672-9722.2024.01.029

融合密度和划分的文本聚类算法

Text Clustering Algorithm Combining Density and Partition

刘龙 1刘新 1蔡林杰 1唐朝1

作者信息

  • 1. 湘潭大学计算机学院·网络空间安全学院 湘潭 411105
  • 折叠

摘要

Abstract

Document clustering is a classic application of clustering,which is to classify similar documents into the same cate-gory,which can effectively organize,summarize and navigate text information,and can also be used to improve the classification ef-fect.This article uses the BERT model to process documents into vectors and represents documents as high-dimensional vectors.The traditional density clustering algorithm is not suitable for high-dimensional data sets.The K-means algorithm in the partition clustering algorithm can effectively cluster documents,but the performance of the algorithm is very dependent on the selection of the initial center point.This paper proposes a new text clustering algorithm that merges density and partition.First,the appropriate clus-tering center points are selected by density,and then the idea of the farthest distance is used to gradually select the initial cluster center points,and finally,the partition method is used to analyze the data set for clustering.Experiments show that the clustering ef-fect of the new algorithm is stable and good clustering results have been achieved.

关键词

文档聚类/BERT/K-均值算法/密度/最远距离

Key words

document clustering/BERT/K-means algorithm/density/farthest distance

分类

信息技术与安全科学

引用本文复制引用

刘龙,刘新,蔡林杰,唐朝..融合密度和划分的文本聚类算法[J].计算机与数字工程,2024,52(1):178-183,6.

基金项目

网络犯罪侦查湖南省普通高校重点实验室开放课题项目(编号:2018WLFZZC003)资助. (编号:2018WLFZZC003)

计算机与数字工程

OACSTPCD

1672-9722

访问量0
|
下载量0
段落导航相关论文