计算机工程与应用2026,Vol.62Issue(6):27-50,24.DOI:10.3778/j.issn.1002-8331.2509-0303
生成式大模型越狱攻击安全性研究综述
Overview of Security Research on Jailbreak Attacks Against Generative Large Models
摘要
Abstract
In recent years,generative large models have been widely used in a variety of key scenarios,including text gen-eration,conversational interaction,and content creation.However,jailbreak attacks are emerging as a threat to these models.Jailbreak attacks can bypass their built-in security mechanisms and induce them to produce harmful outputs,posing security challenges such as ethical risks,privacy leaks,and model abuse.To address this threat,this paper comprehensively reviews recent research progress on jailbreak attacks against two mainstream generative models:large language models and multimodal large language models.This review focuses on three aspects:jailbreak attack types,defense strategies,and security assessment frameworks.It details the basic principles,implementation methods,and research conclusions of current jailbreak attack methods,providing valuable insights for future research.Building on this work,this paper further summarizes the current deficiencies in jailbreak security research for these two mainstream generative models and identi-fies key challenges and development opportunities for future research on the security of generative large models.This review provides guidance for researchers working on the complex applications and security of generative large models.关键词
生成式大模型(GLMs)/越狱攻击/安全挑战/防御策略/安全性研究Key words
generative large models(GLMs)/jailbreak attack/security challenge/defense strategy/security research分类
信息技术与安全科学引用本文复制引用
李燕,王钢,王浩..生成式大模型越狱攻击安全性研究综述[J].计算机工程与应用,2026,62(6):27-50,24.基金项目
内蒙古自治区高校网络安全和教育管理信息化工程研究中心专项(RZ2200000611). (RZ2200000611)