数据采集与处理2025,Vol.40Issue(3):659-674,16.DOI:10.16337/j.1004-9037.2025.03.008
基于压缩的本地差分隐私的序列数据收集方法
Sequential Data Collection Method with Condensed Local Differential Privacy
摘要
Abstract
Condensed local differential privacy is a metric-based relaxation of local differential privacy with better utility and flexibility than local differential privacy.However,existing solutions are deficient in terms of sequence pattern capture and utility.To address these limitations,this paper proposes SCM-CLDP,a novel sequential data collection method based on condensed local differential privacy.SCM-CLDP fully takes into account important information such as the length and transitions of sequential data during the collection process,through which the data collector is able to synthesize privacy-preserving dataset close to the original dataset.Specifically,according to different perturbation objects,we propose two collection methods,SCM-VP based on value perturbation and SCM-TP based on transition perturbation,respectively.We theoretically prove that SCM-VP and SCM-TP satisfy sequence-level condensed local differential privacy,and comparative experiments are conducted with existing solutions based on two real datasets in terms of Markov chain model accuracy,synthetic dataset utility,and frequent sequence pattern mining accuracy.The results show that SCM-CLDP performs significantly better than the existing solutions,with SCM-VP outperforming SCM-TP in most cases.In the optimal situation,SCM-CLDP reduces the error of the Markov chain model and the distribution of the synthetic dataset by at least one order of magnitude compared to the existing method.Meanwhile,SCM-CLDP improves the accuracy of item frequency ranking of the synthetic dataset and the accuracy of frequent sequence pattern mining by nearly 30%compared to existing solutions.关键词
压缩的本地差分隐私/序列数据/Markov链模型/数据收集/隐私保护Key words
condensed local differential privacy/sequential data/Markov chain model/data collection/privacy protection分类
计算机与自动化引用本文复制引用
金严,朱友文,吴启晖..基于压缩的本地差分隐私的序列数据收集方法[J].数据采集与处理,2025,40(3):659-674,16.基金项目
江苏省重点研发计划(产业前瞻与关键核心技术)(BE2022068,BE2022068-1) (产业前瞻与关键核心技术)
国家自然科学基金(62172216) (62172216)
中央高校基本科研业务费项目(NP2024117) (NP2024117)
稳定支持国防特色学科基础研究项目(ILF240061A24). (ILF240061A24)