基于高斯核函数的差分隐私技术联合聚类算法在医疗数据安全中的应用OACSTPCD
Application of Differential Privacy Technology Based on Gaussian Kernel Function Combined Clustering Algorithm in Medical Data Security
目的 针对数据隐私泄露的风险,提出一种基于高斯核函数的差分隐私技术联合聚类算法.通过对医疗数据的处理和保护,旨在提供一种保证医疗数据隐私安全的解决方案.方法 通过介绍医疗数据在机器学习过程中隐私暴露的问题以及差分隐私技术原理、差分隐私模糊C均值聚类算法(Differential Privacy Fuzzy C-means Algorithm,DPFCM)和基于高斯核函数的差分隐私模糊C均值聚类算法(Differential Privacy Fuzzy C-means Algorithm Based on Gaussian Kernel Function,DPFCM_GF)的构建过程,采用最大距离法确定初始中心点,使用聚类中心点的高斯值来计算隐私预算分配比率,使用拉普拉斯噪声完成差分隐私保护.通过收集整理心脏病、乳腺癌、甲状腺疾病、糖尿病的公开数据对各算法进行验证.结果 DPFCM_GF和DPFCM对不同数据集的聚类效果随隐私预算的增加逐渐改善.DPFCM_GF限值隐私预算分别为1.31、0.85、0.66、1.75,相对DPFCM减少了41.78%、50.29%、53.52%、38.38%,具有较快的收敛迭代速度,增幅差异具有统计学意义(P<0.05).结论 在医疗数据分析中,DPFCM_GF在一定程度上能够保护医疗数据的隐私,同时可提供具有较高准确性的聚类结果,具有潜在的应用前景和市场价值.
Objective Aiming at the risk of data privacy,to propose a differential privacy technology based on the Gaussian kernel function combined clustering algorithm and to provide a solution to ensure the privacy and security of medical data through the processing and protection of medical data.Methods The issues of healthcare data privacy exposure in the process of machine learning,the principles of differential privacy technology,the construction process of differential privacy fuzzy C-means algorithm(DPFCM)and the differential privacy fuzzy C-means algorithm based on Gaussian kernel function(DPFCM_GF)were introduced.The maximum distance method was adopted to determine initial centroids and Gaussian values of the clustering centroids were utilized to calculate the privacy budget allocation ratio.Laplace noise was employed to achieve differential privacy protection.Finally,publicly available data on heart disease,breast cancer,thyroid diseases,and diabetes were collected to validate various algorithms.Results With the increase in privacy budget,the clustering effects of DPFCM_GF and the DPFCM were gradually improved.The privacy budget threshold values for the DPFCM_GF were 1.31,0.85,0.66,and 1.75,respectively,which were 41.78%,50.29%,53.52%,and 38.38% less than the DPFCM.The DPFCM_GF had a fast convergence iteration speed,and the increase difference was statistically significant(P<0.05).Conclusion The DPFCM_GF in medical data analysis can protect the privacy of medical data to a certain extent while providing highly accurate clustering results.It has promising application prospects and market value.
曹自雄;陈宇鲜;蒋秀梅
淮安市第二人民医院 信息统计中心,江苏 淮安 223003淮安市第五人民医院 档案室,江苏 淮安 223003
预防医学
高斯核函数差分隐私技术聚类算法模糊C均值聚类算法隐私预算
Gaussian kernel functiondifferential privacy technologyalgorithmfuzzy C-means algorithmprivacy budget
《中国医疗设备》 2024 (007)
28-35 / 8
评论