首页|期刊导航|信息安全研究|基于群体极化嵌套越狱模板的大模型安全评估技术研究

基于群体极化嵌套越狱模板的大模型安全评估技术研究

王红杰孙培淇杜彦辉刘楠

信息安全研究2026，Vol.12Issue(5)：410-419,10.

信息安全研究2026，Vol.12Issue(5)：410-419,10.DOI:10.12379/j.issn.2096-1057.2026.05.03

基于群体极化嵌套越狱模板的大模型安全评估技术研究

Research on Large Model Security Assessment Technology Based on Group Polarization Nested Jailbreak Templates

王红杰 ¹孙培淇 ¹杜彦辉 ¹刘楠²

作者信息

1. 中国人民公安大学信息网络安全学院北京 100038
2. 网络安全等级保护与安全保卫技术国家工程研究中心上海 201100
折叠

摘要

Abstract

As large model demonstrates excellent performance in natural language processing tasks,its security issues become increasingly prominent.Jailbreak attacks bypass model security mechanisms,weaken value alignment constraints,and induce models to generate harmful content.The risks of model abuse,hijacking,and information leakage caused by such attacks pose security threats to the large language model application ecosystem.To comprehensively evaluate large model security performance,a nested jailbreak template technique based on the group polarization psychological effect is proposed,which guides models to generate complex responses through progressively nested instructions.Based on this,the NesT-HGA(nested template-hierarchical genetic algorithm)framework is constructed by integrating hierarchical genetic algorithms.Experimental results show that this method achieves an average attack success rate of over 80％across 8 mainstream large models,statistical tests confirm significant differences from existing methods,and ablation experiments verify component synergistic effects,effectively evaluating the security and robustness of large models against complex attacks.

关键词

越狱攻击/群体极化效应/嵌套指令/层次遗传算法/大模型安全评估

Key words

jailbreak attack/group polarization effect/nested instruction/hierarchical genetic algorithm/large model security assessment

分类

信息技术与安全科学

引用本文复制引用

王红杰,孙培淇,杜彦辉,刘楠..基于群体极化嵌套越狱模板的大模型安全评估技术研究[J].信息安全研究,2026,12(5):410-419,10.

基金项目

网络安全等级保护与安全保卫技术国家工程研究中心行动计划基金项目(C23640-XD-08) （C23640-XD-08）

中央高校基本科研业务费专项资金项目(2024JKF14) （2024JKF14）

提升自主创新—网络空间安全执法技术双一流专项(2023SYL07) （2023SYL07）

信息安全研究

ISSN：2096-1057

访问量0

下载量0

段落导航