基于自动稀释的文本对抗攻击强化方法OA北大核心CSTPCD
Text adversarial attack capability enhancement method based on automatic dilution
使用对抗样本进行训练可以提升深度神经网络的鲁棒性,因此提升对抗攻击成功率是对抗样本研究领域的一个重要内容.对原始样本进行稀释操作可以使其更靠近模型的决策边界,进而提高对抗攻击的成功率.然而,现有稀释算法存在基于人工生成的稀释池和稀释目标词性单一等问题,因此,提出了一种基于自动稀释的文本对抗攻击强化方法,称为自动多词性稀释预处理(Automatic Multi-positional Dilution Preprocessing,AMDP)算法.AMDP算法使稀释过程摆脱了对人工辅助的依赖,针对不同数据集和目标模型生成不同的稀释池.同时,AMDP算法扩展了稀释目标的词性,扩大了稀释操作的搜索空间.作为一种输入转换方法,AMDP可以与其他对抗攻击算法相结合,进一步提高对抗攻击的性能.实验结果表明,AMDP在BERT,WordCNN和WordLSTM分类模型上的攻击成功率平均提升约10%,同时减少了对原始样本的平均修改率和对目标模型的平均访问次数.
Using adversarial examples for training enhances the robustness of deep neural networks.Therefore,improving the success rate of adversarial attacks is significant in the field of adversarial example research.Diluting original samples can bring them closer to the decision boundary of the model,thereby increasing the success rate of adversarial attacks.However,existing dilution algorithms suffer from issues such as reliance on manually generated dilution pools and single dilution targets.This paper proposes a method to enhance the capability of text adversarial attacks based on automatic dilution,called the Automatic Multi-positional Dilution Preprocessing(AMDP)algorithm.The AMDP algorithm eliminates the reliance on manual assistance in the dilution process and generates different dilution pools for different datasets and target models.Additionally,AMDP extends the targeted words for dilution,broadening the search space of dilution operations.As an input transformation method,AMDP can be combined with other adversarial attack algorithms to further enhance attack performance.Experimental results demonstrate that AMDP increases the success rate by approximately 10%on average on BERT,WordCNN,and WordLSTM classification models,while reducing the average modification rate of original samples and the average number of accesses to the target model.
房钰深;陈振华;何琨
华中科技大学计算机科学与技术学院,武汉,430074
地球科学
对抗机器学习对抗样本文本稀释分类边界自然语言处理
adversarial machine learningadversarial samplestext dilutionclassification boundariesnatural language processing
《南京大学学报(自然科学版)》 2024 (006)
900-907 / 8
国家自然科学基金(62076105,U22B2017)
评论