| 注册
首页|期刊导航|信息安全研究|基于群体极化嵌套越狱模板的大模型安全评估技术研究

基于群体极化嵌套越狱模板的大模型安全评估技术研究

王红杰 孙培淇 杜彦辉 刘楠

信息安全研究2026,Vol.12Issue(5):410-419,10.
信息安全研究2026,Vol.12Issue(5):410-419,10.DOI:10.12379/j.issn.2096-1057.2026.05.03

基于群体极化嵌套越狱模板的大模型安全评估技术研究

Research on Large Model Security Assessment Technology Based on Group Polarization Nested Jailbreak Templates

王红杰 1孙培淇 1杜彦辉 1刘楠2

作者信息

  • 1. 中国人民公安大学信息网络安全学院 北京 100038
  • 2. 网络安全等级保护与安全保卫技术国家工程研究中心 上海 201100
  • 折叠

摘要

Abstract

As large model demonstrates excellent performance in natural language processing tasks,its security issues become increasingly prominent.Jailbreak attacks bypass model security mechanisms,weaken value alignment constraints,and induce models to generate harmful content.The risks of model abuse,hijacking,and information leakage caused by such attacks pose security threats to the large language model application ecosystem.To comprehensively evaluate large model security performance,a nested jailbreak template technique based on the group polarization psychological effect is proposed,which guides models to generate complex responses through progressively nested instructions.Based on this,the NesT-HGA(nested template-hierarchical genetic algorithm)framework is constructed by integrating hierarchical genetic algorithms.Experimental results show that this method achieves an average attack success rate of over 80%across 8 mainstream large models,statistical tests confirm significant differences from existing methods,and ablation experiments verify component synergistic effects,effectively evaluating the security and robustness of large models against complex attacks.

关键词

越狱攻击/群体极化效应/嵌套指令/层次遗传算法/大模型安全评估

Key words

jailbreak attack/group polarization effect/nested instruction/hierarchical genetic algorithm/large model security assessment

分类

信息技术与安全科学

引用本文复制引用

王红杰,孙培淇,杜彦辉,刘楠..基于群体极化嵌套越狱模板的大模型安全评估技术研究[J].信息安全研究,2026,12(5):410-419,10.

基金项目

网络安全等级保护与安全保卫技术国家工程研究中心行动计划基金项目(C23640-XD-08) (C23640-XD-08)

中央高校基本科研业务费专项资金项目(2024JKF14) (2024JKF14)

提升自主创新—网络空间安全执法技术双一流专项(2023SYL07) (2023SYL07)

信息安全研究

2096-1057

访问量0
|
下载量0
段落导航相关论文