工程科学与技术2025,Vol.57Issue(2):40-53,14.DOI:10.12454/j.jsuese.202300821
基于邻域标准差的密度调整谱聚类算法
Density Adjusted Spectral Clustering Algorithm Based on Neighborhood Standard Deviation
摘要
Abstract
Objective Clustering involves partitioning given data into multiple classification clusters without requiring knowledge of the true labels.In other words,clustering is an unsupervised learning process that ensures that samples within the same cluster are as similar as possible while those in dif-ferent clusters are as dissimilar as possible.Traditional clustering algorithms can be categorized into five primary types:hierarchical,density-based,grid-based,model-based,and partition-based algorithms.However,these traditional models are sensitive to the cluster number K,outliers,and noise points.In addition,the selection of the initial cluster center is set artificially,and the cluster shape cannot extend arbitrarily.Spectral clustering,a clustering algorithm based on spectral graph theory,transforms the clustering problem into an optimal partition problem of the graph.This approach has emerged as one of the most widely adopted techniques in various domains,including data mining,face recognition,and image segmentation. Methods The spectral clustering algorithm provides simplicity in implementation,robustness to high-dimensional data,and flexibility in handling diverse sample space shapes.This enables global optimization and often results in superior performance compared to traditional clustering al-gorithms.However,existing spectral clustering methods encounter challenges,such as determining the scale parameter,suboptimal clustering res-ults in multi-scale datasets,and instability in the obtained clusters.A density-adjusted spectral clustering algorithm based on neighborhood stand-ard deviation is proposed to address the unstable clustering results caused by the random selection of initial cluster centers and the artificial set-ting of scale parameters in spectral clustering.This approach incorporates both global and local data information.The proposed algorithm treats the initial class center value and scale parameter as decision variables to achieve adaptive optimization and enhance spectral clustering.The neigh-borhood labeling difference is employed to determine the initial cluster center value,while the scale parameter is optimized to improve the expres-sion of the Gaussian kernel function.The calculation methods for the scale parameter,sample local density,and initial cluster center of the dens-ity peak are redefined using the neighborhood standard deviation.Therefore,the proposed spectral clustering algorithm eliminates the need for neighborhood parameter input and produces stable clustering results.First,an initial class center decision value algorithm based on density peak(DP_KD)is designed to resolve the issue of unstable clustering results in density-adjusted spectral clustering.First,the initial cluster center value determination method replaces the artificial setting of scale parameters.In this process,K points are randomly selected from the dataset as initial clustering centers and different classes are assigned based on each selected point.Then,the reciprocal of the sample neighborhood standard devi-ation is used as a parameter to measure the local density of the sample,while the average distance between samples is utilized to calculate the cor-responding neighborhood radius of each sample.Simultaneously,the neighborhood standard deviation inverse is defined as the local sample dens-ity,and the sample with the highest density is assigned the minimum Euclidean distance among the samples.Finally,the product of sample local density and sample distance is defined as the initial class center determination value λ.The samples with the largest λK are selected as the initial class centers for this algorithm,forming the center value Decision algorithm DP_KD. Results and Discussions The change of the scale parameter influences the construction of the similarity matrix,which affects the Laplacian mat-rix and the formation of the eigenvector space.In the spectral clustering algorithm,the similarity matrix is critical in determining the quality of the final clustering outcomes.Therefore,the Euclidean distance between a sample and its p-th nearest neighbor is introduced as the scale parameter for each sample,utilizing the neighbor information to enhance the similarity matrix and reduce the sensitivity of the spectral clustering algorithm to the scale parameter.In addition,based on the optimized initial class center decision value and the nearest neighbor parameter method,a density-adjusted spectral clustering algorithm based on neighborhood standard deviation(DSSD)is proposed to address the instability of clustering res-ults caused by the random selection of initial cluster centers and the artificial setting of scale parameters.The proposed algorithm effectively fine-tunes the similarity between sample points in regions of varying density for multi-scale datasets. Conclusions Experimental results indicated that,compared to other spectral clustering algorithms,the proposed algorithm achieves improved clustering performance and yields more stable clustering results.When compared to IK_DM,IIEFA,OFMMK-means,classical clustering al-gorithms such as K-means and K-medoids,as well as the AP and SNN-MSC algorithms,the clustering performance of the DP_KD algorithm across seven UCI datasets is superior to that of the other algorithms.These findings demonstrate the feasibility and effectiveness of integrating the DP_KD algorithm into density-adjusted spectral clustering as a replacement for classical clustering algorithms.The experiments are conducted on seven UCI datasets:Iris,Wine,Seeds,Yeast,Glass,Ecoli,and Waveform.Four evaluation metrics,Accuracy(ACC),Purity,Rand Index(RI),and F-measure(F1),are employed to assess the clustering performance.A higher value for these evaluation indices signifies the superior perform-ance of the proposed DP_KD algorithm.In general,the clustering effectiveness of the DP_KD algorithm across the seven UCI datasets surpasses that of the other algorithms.In addition,the DSSD algorithm is compared to four spectral clustering algorithms:ADSC,WDSC,SC,and PRSC.The experiments are conducted on seven UCI datasets:Sonar,Musk,Breast,Statlog,Madelon,Segmentation,and Waveform.The ACC index is utilized to evaluate the clustering effect of these spectral clustering algorithms.Experimental results demonstrated that the DSSD algorithm en-hances the clustering effectiveness of density-adjusted spectral clustering and outperforms other advanced spectral clustering algorithms.关键词
谱聚类/密度调整/邻域标准差/自适应/密度峰值Key words
spectral clustering/density adjustment/neighborhood standard deviation/adaptive/density peak分类
信息技术与安全科学引用本文复制引用
郭笑雨,刘金金,陈亚军,李豪杰,袁培燕,赵晓焱..基于邻域标准差的密度调整谱聚类算法[J].工程科学与技术,2025,57(2):40-53,14.基金项目
国家自然科学基金项目(62072159 ()
61902112) ()
河南省科技攻关项目(222102210011 ()
232102211061) ()