计算机技术与发展2017,Vol.27Issue(2):1-5,5.DOI:10.3969/j.issn.1673-629X.2017.02.001
基于核密度估计的K-means聚类优化
K-means Clustering Optimization Based on Kernel Density Estimation
摘要
Abstract
K-means clustering algorithm is classical and widely used in many fields,but it has poor performance in the case of processing high dimensional and large data sets.Kernel density estimation is a nonparametric estimation method to estimate the density function of unknown distribution,which can effectively obtain the distribution of the data set.Sampling is a common method fordata mining in large data sets.Density biased sampling is an improved method for the problem of easy loss of important information when using the simple random sampling in the inclined date set.A method is proposed using result of kernel density estimation,which chooses sample points from neighborhood of peak of density function of dataset as the initial center parameters of K-means and uses result of kernel density estimation to perform density biased sampling on the dataset,then runs K-means clustering on the sample set.The experimental results show that using the kernel density estimation for selection of initial parameters and density bias sample can effectively accelerate the K-means clustering process.关键词
K-means聚类/密度偏差抽样/核密度估计/数据挖掘Key words
K-means clustering/density bias sampling/kernel density estimation/data mining分类
信息技术与安全科学引用本文复制引用
熊开玲,彭俊杰,杨晓飞,黄俊..基于核密度估计的K-means聚类优化[J].计算机技术与发展,2017,27(2):1-5,5.基金项目
国家自然科学基金资助项目(61201446) (61201446)