| 注册
首页|期刊导航|计算机工程|基于贝叶斯网络的差分隐私高维数据发布技术研究

基于贝叶斯网络的差分隐私高维数据发布技术研究

卢晓天 朴春慧 杨兴雨 白英杰

计算机工程2024,Vol.50Issue(5):167-181,15.
计算机工程2024,Vol.50Issue(5):167-181,15.DOI:10.19678/j.issn.1000-3428.0067967

基于贝叶斯网络的差分隐私高维数据发布技术研究

Research on Differential Privacy High Dimensional Data Publishing Technology Based on Bayesian Networks

卢晓天 1朴春慧 1杨兴雨 1白英杰2

作者信息

  • 1. 石家庄铁道大学信息科学与技术学院,河北石家庄 050043||河北省电磁环境效应与信息处理重点实验室,河北石家庄 050043
  • 2. 河北省电磁环境效应与信息处理重点实验室,河北石家庄 050043||北京全路通信信号研究设计院集团有限公司,北京 100070
  • 折叠

摘要

Abstract

Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,when e≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.

关键词

数据发布/贝叶斯网络/差分隐私/隐私保护/相关矩阵/平均互信息

Key words

data publishing/Bayesian network/differential privacy/privacy protection/correlation matrix/Average Mutual Information(AMI)

分类

信息技术与安全科学

引用本文复制引用

卢晓天,朴春慧,杨兴雨,白英杰..基于贝叶斯网络的差分隐私高维数据发布技术研究[J].计算机工程,2024,50(5):167-181,15.

基金项目

河北省重点研发计划(21355902D). (21355902D)

计算机工程

OA北大核心CSTPCD

1000-3428

访问量6
|
下载量0
段落导航相关论文