首页|期刊导航|网络安全与数据治理|基于梯度优化的大语言模型后门识别探究

基于梯度优化的大语言模型后门识别探究

陈佳华陈宇曹婍

网络安全与数据治理2023，Vol.42Issue(12)：14-19,6.

网络安全与数据治理2023，Vol.42Issue(12)：14-19,6.DOI:10.19358/j.issn.2097-1788.2023.12.003

基于梯度优化的大语言模型后门识别探究

Research on gradient optimization-based backdoor identification of large language model

陈佳华 ¹陈宇 ²曹婍³

作者信息

1. 电子科技大学信息与软件工程学院,四川成都 610066
2. 北京邮电大学计算机学院,北京 100876
3. 中国科学院计算技术研究所智能算法安全重点实验室,北京 100190
折叠

摘要

Abstract

With the popularity of large language models(LLM)and their application in more fields,the security concerns of large language models also arise.In general,training LLM has extremely demanding requirements for datasets and computing re-sources,so most users who need to use them directly use open-source datasets and models on the Internet,which provides an ex-cellent greenhouse for backdoor attacks.A backdoor attack is when a user enters normal data into the model as if it were not injec-ted with a backdoor,but the model output is abnormal when data with a backdoor trigger is input.An effective way to prevent backdoor attacks is to perform backdoor identification.At present,gradient-based optimization methods are commonly used,but the setting of internal impact factors has a great impact on the recognition effect when using these methods.In this paper,the word token length,the number of nearest neighbors,and the noise scale are measured experimentally and the mechanism of action is analyzed,so as to provide reference for researchers who use these methods in the future.

关键词

大语言模型/后门攻击/基于梯度的后门识别/影响因子

Key words

large language models/backdoor attack/gradient-based backdoor identification/impact factor

分类

信息技术与安全科学

引用本文复制引用

陈佳华,陈宇,曹婍..基于梯度优化的大语言模型后门识别探究[J].网络安全与数据治理,2023,42(12):14-19,6.

基金项目

国家重点研发计划(2022YFB3103700,2022YFB3103701) （2022YFB3103700,2022YFB3103701）

网络安全与数据治理

ISSN：2097-1788

访问量4

下载量0

段落导航