摘要
Abstract
Objective Aiming at the risk of data privacy,to propose a differential privacy technology based on the Gaussian kernel function combined clustering algorithm and to provide a solution to ensure the privacy and security of medical data through the processing and protection of medical data.Methods The issues of healthcare data privacy exposure in the process of machine learning,the principles of differential privacy technology,the construction process of differential privacy fuzzy C-means algorithm(DPFCM)and the differential privacy fuzzy C-means algorithm based on Gaussian kernel function(DPFCM_GF)were introduced.The maximum distance method was adopted to determine initial centroids and Gaussian values of the clustering centroids were utilized to calculate the privacy budget allocation ratio.Laplace noise was employed to achieve differential privacy protection.Finally,publicly available data on heart disease,breast cancer,thyroid diseases,and diabetes were collected to validate various algorithms.Results With the increase in privacy budget,the clustering effects of DPFCM_GF and the DPFCM were gradually improved.The privacy budget threshold values for the DPFCM_GF were 1.31,0.85,0.66,and 1.75,respectively,which were 41.78%,50.29%,53.52%,and 38.38% less than the DPFCM.The DPFCM_GF had a fast convergence iteration speed,and the increase difference was statistically significant(P<0.05).Conclusion The DPFCM_GF in medical data analysis can protect the privacy of medical data to a certain extent while providing highly accurate clustering results.It has promising application prospects and market value.关键词
高斯核函数/差分隐私技术/聚类算法/模糊C均值聚类算法/隐私预算Key words
Gaussian kernel function/differential privacy technology/algorithm/fuzzy C-means algorithm/privacy budget分类
医药卫生