网络安全与数据治理2023,Vol.42Issue(12):14-19,6.DOI:10.19358/j.issn.2097-1788.2023.12.003
基于梯度优化的大语言模型后门识别探究
Research on gradient optimization-based backdoor identification of large language model
摘要
Abstract
With the popularity of large language models(LLM)and their application in more fields,the security concerns of large language models also arise.In general,training LLM has extremely demanding requirements for datasets and computing re-sources,so most users who need to use them directly use open-source datasets and models on the Internet,which provides an ex-cellent greenhouse for backdoor attacks.A backdoor attack is when a user enters normal data into the model as if it were not injec-ted with a backdoor,but the model output is abnormal when data with a backdoor trigger is input.An effective way to prevent backdoor attacks is to perform backdoor identification.At present,gradient-based optimization methods are commonly used,but the setting of internal impact factors has a great impact on the recognition effect when using these methods.In this paper,the word token length,the number of nearest neighbors,and the noise scale are measured experimentally and the mechanism of action is analyzed,so as to provide reference for researchers who use these methods in the future.关键词
大语言模型/后门攻击/基于梯度的后门识别/影响因子Key words
large language models/backdoor attack/gradient-based backdoor identification/impact factor分类
信息技术与安全科学引用本文复制引用
陈佳华,陈宇,曹婍..基于梯度优化的大语言模型后门识别探究[J].网络安全与数据治理,2023,42(12):14-19,6.基金项目
国家重点研发计划(2022YFB3103700,2022YFB3103701) (2022YFB3103700,2022YFB3103701)