信息安全研究2026,Vol.12Issue(5):410-419,10.DOI:10.12379/j.issn.2096-1057.2026.05.03
基于群体极化嵌套越狱模板的大模型安全评估技术研究
Research on Large Model Security Assessment Technology Based on Group Polarization Nested Jailbreak Templates
摘要
Abstract
As large model demonstrates excellent performance in natural language processing tasks,its security issues become increasingly prominent.Jailbreak attacks bypass model security mechanisms,weaken value alignment constraints,and induce models to generate harmful content.The risks of model abuse,hijacking,and information leakage caused by such attacks pose security threats to the large language model application ecosystem.To comprehensively evaluate large model security performance,a nested jailbreak template technique based on the group polarization psychological effect is proposed,which guides models to generate complex responses through progressively nested instructions.Based on this,the NesT-HGA(nested template-hierarchical genetic algorithm)framework is constructed by integrating hierarchical genetic algorithms.Experimental results show that this method achieves an average attack success rate of over 80%across 8 mainstream large models,statistical tests confirm significant differences from existing methods,and ablation experiments verify component synergistic effects,effectively evaluating the security and robustness of large models against complex attacks.关键词
越狱攻击/群体极化效应/嵌套指令/层次遗传算法/大模型安全评估Key words
jailbreak attack/group polarization effect/nested instruction/hierarchical genetic algorithm/large model security assessment分类
信息技术与安全科学引用本文复制引用
王红杰,孙培淇,杜彦辉,刘楠..基于群体极化嵌套越狱模板的大模型安全评估技术研究[J].信息安全研究,2026,12(5):410-419,10.基金项目
网络安全等级保护与安全保卫技术国家工程研究中心行动计划基金项目(C23640-XD-08) (C23640-XD-08)
中央高校基本科研业务费专项资金项目(2024JKF14) (2024JKF14)
提升自主创新—网络空间安全执法技术双一流专项(2023SYL07) (2023SYL07)