信息安全研究2025,Vol.11Issue(1):21-27,7.DOI:10.12379/j.issn.2096-1057.2025.01.04
基于可解释性的不可见后门攻击研究
Research of Invisible Backdoor Attack Based on Interpretability
摘要
Abstract
Deep learning has achieved remarkable success on a variety of critical tasks.However,recent work has shown that deep neural networks are vulnerable to backdoor attacks,where attackers release inverse models that behave normally on benign samples,but misclassify samples imposed by any trigger to the target label.Unlike adversarial samples,backdoor attacks are mainly implemented in the model training phase,perturbing samples with triggers and injecting backdoors into the model.This paper proposes an invisible backdoor attack based on interpretability algorithms.Different from the existing works that arbitrarily set the trigger mask,this paper carefully designs a trigger mask determination based on interpretability,and uses the latest random pixel perturbation as the trigger style design,so that the sample pairs imposed by the trigger are more natural and undetectable to avoid the detection of the human eye,and the defense strategy against the backdoor attack.In this paper,we conduct a large number of comparative experiments on CIFAR-10,CIFAR-100 and ImageNet datasets to demonstrate the effectiveness and superiority of our attack.The SSIM index is also used to evaluate the difference between the backdoor samples designed in this paper and the benign samples,and an evaluation index close to 0.99 is obtained,which proves that the backdoor samples generated in this paper are not identifiable under visual inspection.Finally,this paper also proves that the proposed attack is defensible against the existing backdoor defense methods.关键词
深度学习/深度神经网络/后门攻击/触发器/可解释性/后门样本Key words
deep learning/deep neural network/backdoor attack/trigger/interpretability/backdoor sample分类
计算机与自动化引用本文复制引用
郑嘉熙,陈伟,尹萍,张怡婷..基于可解释性的不可见后门攻击研究[J].信息安全研究,2025,11(1):21-27,7.基金项目
江苏省重点研发项目(BE2022065-5) (BE2022065-5)
江苏省网络与信息安全重点实验室项目(BM2003201) (BM2003201)