|国家科技期刊平台
首页|期刊导航|计算机工程|基于贝叶斯网络的差分隐私高维数据发布技术研究

基于贝叶斯网络的差分隐私高维数据发布技术研究OA北大核心CSTPCD

Research on Differential Privacy High Dimensional Data Publishing Technology Based on Bayesian Networks

中文摘要英文摘要

在实现隐私保护的同时提高数据可用性是高维结构化数据发布研究中的挑战性问题,经典算法PrivBayes针对该问题提供了一种解决方案.为进一步减少计算开销、提高数据可用性,提出基于贝叶斯网络的差分隐私数据发布算法ELPrivBayes.分析贝叶斯网络结构学习阶段的理论计算开销,构建存储属性之间互信息的相关矩阵,避免结构学习算法迭代过程中互信息的冗余计算,降低了时间复杂度.基于平均互信息优化了节点进入贝叶斯网络的顺序,提高结构学习迭代过程中指数机制贡献的互信息期望值,进而提高生成数据集与原始数据集的统计近似度,并实证分析网络结构质量对首节点选择的低敏感性.在4个典型数据集上的实验结果表明,与经典算法PrivBayes及其改进方案相比较,结构学习阶段的计算开销降低了 97%~99%,基于指数机制捕获的互信息提高了 14%~67%,生成数据集与原始数据集的平均变差距离降低了 32%~40%,构建的支持向量机(SVM)分类器的准确率提高了 4%~5%,并且当e≤0.8时,采用ELPrivBayes算法生成数据的可用性提升更为显著.

Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,when e≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.

卢晓天;朴春慧;杨兴雨;白英杰

石家庄铁道大学信息科学与技术学院,河北石家庄 050043||河北省电磁环境效应与信息处理重点实验室,河北石家庄 050043河北省电磁环境效应与信息处理重点实验室,河北石家庄 050043||北京全路通信信号研究设计院集团有限公司,北京 100070

计算机与自动化

数据发布贝叶斯网络差分隐私隐私保护相关矩阵平均互信息

data publishingBayesian networkdifferential privacyprivacy protectioncorrelation matrixAverage Mutual Information(AMI)

《计算机工程》 2024 (005)

167-181 / 15

河北省重点研发计划(21355902D).

10.19678/j.issn.1000-3428.0067967

评论