计算机工程2025,Vol.51Issue(11):162-170,9.DOI:10.19678/j.issn.1000-3428.0069983
基于特征融合的对抗样本定向目标攻击可迁移性增强
Transferability Enhancement of Adversarial Sample Directed Targeted Attack Based on Feature Fusion
摘要
Abstract
Adversarial examples can be used to perform transferable attacks on black-box models using surrogates,without knowing the internal structure and parameters of the black-box model.Previous studies have reported relatively low transferability of targeted attacks on black-box models.This study proposes a method for enhancing the transferability of image-directed targeted attacks based on feature fusion.First,adversarial examples are generated via ensemble attacks.Subsequently,using the gradient direction of existing adversarial examples as a baseline,clean features extracted from the original image are used as perturbations to fine tune the existing adversarial examples for improving the transferability of targeted attacks.For model ensembling,a gradient adaptive module is introduced based on the contribution of each model to the overall adversarial objective.To reduce the gradient differences among different models,a gradient filter is proposed for synchronously controlling the gradient direction.Using the feature fusion module,the clean features of the original image are mixed to fine tune the gradient direction of the existing adversarial examples for mitigating the issue of overfocusing on specific features.Experiments on the ImageNet-Compatible dataset reveal that,compared to the Clean Feature Mixup(CFM)method,the proposed method improves the average attack success rate by 7.7 percentage points for non-robustly trained models and by 5.3 percentage points for robustly trained and Transformer models,demonstrating the effectiveness of the method.关键词
深度学习/对抗攻击/对抗样本/定向目标攻击/可迁移性Key words
deep learning/adversarial attacks/adversarial examples/directed targeted attacks/transferability分类
计算机与自动化引用本文复制引用
凌海,凌捷..基于特征融合的对抗样本定向目标攻击可迁移性增强[J].计算机工程,2025,51(11):162-170,9.基金项目
广州市重点领域研发计划(202007010004). (202007010004)