计算机工程2024,Vol.50Issue(5):167-181,15.DOI:10.19678/j.issn.1000-3428.0067967
基于贝叶斯网络的差分隐私高维数据发布技术研究
Research on Differential Privacy High Dimensional Data Publishing Technology Based on Bayesian Networks
摘要
Abstract
Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,when e≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.关键词
数据发布/贝叶斯网络/差分隐私/隐私保护/相关矩阵/平均互信息Key words
data publishing/Bayesian network/differential privacy/privacy protection/correlation matrix/Average Mutual Information(AMI)分类
信息技术与安全科学引用本文复制引用
卢晓天,朴春慧,杨兴雨,白英杰..基于贝叶斯网络的差分隐私高维数据发布技术研究[J].计算机工程,2024,50(5):167-181,15.基金项目
河北省重点研发计划(21355902D). (21355902D)