人工智能后门防御评估方法及其架构研究OACSTPCD
Research on method and architecture for defense assessment of artificial intelligence backdoors
为了应对人工智能系统可能面临的后门攻击风险,研究人员已经开发了一系列后门防御策略.现有防御方法评估标准的多样性,使得跨方法比较成为一大挑战,因此提出了一种人工智能后门防御统一评估框架.该框架旨在为不同层面(包括数据集级别和模型级别等)的防御策略,提供一个公共的评价标准.在数据集级别,主要通过准确率来评估后门检测的有效性;而在模型级别,则主要关注攻击成功率等指标.人工智能后门防御统一评估框架,能够在相同的评价标准下,对比和分析不同后门防御方法的性能.这不仅有助于识别各方法的优势和不足,还能够提出针对性改进建议.结果表明,人工智能后门防御统一评估框架能有效地评估不同防御策略的性能,为进一步提升人工智能系统的安全性提供重要的参考依据.
In response to the potential risk of backdoor attacks faced by artificial intelligence systems,a range of backdoor defense strategies are developed.The diversity of the evaluation criteria for existing defense method,makes cross-method comparisons a significant challenge.Hence,a unified evaluation framework base on artificial intelligence backdoors was proposed.This framework aimed to provide a common standard for evaluating different levels of defense strategies,including dataset-level and model-level defenses.Regarding the dataset-level defense strategies,the effectiveness of backdoor detection was primarily assessed through accuracy.Regarding the model-level defense strategies,focus was mainly placed on metrics such as attack success rate.By implementing unified evaluation framework,the performance of various backdoor defense methods under the same standards were compared and analyzed.This not only aids in identifying the strengths and weaknesses of each method,but also proposes targeted suggestions for improvements.The results indicate that unified evaluation framework can effectively measure the performance of different defense strategies,providing an important reference for further enhancing the security of artificial intelligence systems.
谢天;李强;鞠卓亚;韩嘉祺;易平
上海交通大学网络空间安全学院,上海 20024032178部队科技创新中心,北京,100012
计算机与自动化
人工智能安全后门攻击后门防御统一评估
artificial intelligence securitybackdoor attackbackdoor defenseunified evaluation
《智能科学与技术学报》 2024 (003)
381-393 / 13
国家自然科学基金项目(No.62202290) The National Natural Science Foundation of China(No.62202290)
评论