| 注册
首页|期刊导航|航空学报|基于可解释分层强化学习的防空反导策略优化

基于可解释分层强化学习的防空反导策略优化

刘宇衡 杨力 黄琦龙

航空学报2026,Vol.47Issue(8):248-263,16.
航空学报2026,Vol.47Issue(8):248-263,16.DOI:10.7527/S1000-6893.2025.32786

基于可解释分层强化学习的防空反导策略优化

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

刘宇衡 1杨力 1黄琦龙1

作者信息

  • 1. 南京理工大学 自动化学院,南京 210094
  • 折叠

摘要

Abstract

Air and Missile Defense(AMD)systems are core elements of a nation's aerospace security shield,and their target-interception capability is key to determining combat effectiveness.With the evolution of warfare,the AMD interception problem is increasingly characterized by large target scales,pronounced value heterogeneity,and strin-gent real-time requirements.Existing techniques typically face an interception policy space that grows exponentially with target count,poor sample efficiency under delayed rewards,and opaque decision processes,making them insuf-ficient for operational needs.To address these challenges,this paper proposes an interception strategy framework based on Explainable Hierarchical Dueling DQN(EHD-DQN).This framework suppresses exponential policy-space growth and shortens the decision chain through a hierarchical decoupling of"upper-level ranking → lower-level inter-ception".A temporally decayed multi-experience buffers is introduced to improve sample efficiency and convergence stability under delayed rewards.Moreover,an explainability module that combines Gradient-weighted Class Activation Mapping(Grad-CAM)and Local Interpretable Model-agnostic Explanations(LIME)is embedded to inject explanation signals into the training loop and provide traceable decision rationales.Compared with Deep Q-Network(DQN),Deep Deterministic Policy Gradient(DDPG),Proximal Policy Optimization(PPO),and three traditional optimization algo-rithms—Rolling-Horizon Mixed-Integer Linear Programming(RH-MILP),Non-dominated Sorting Genetic Algorithm Ⅱ(NSGA-Ⅱ),and Adaptive Large Neighborhood Search(ALNS),EHD-DQN achieves superior performance in inter-ception count,ammunition utilization,and engagement timing for high-value targets,while furnishing transparent,staff-oriented justifications for command decision-making.The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.

关键词

分层强化学习/可解释性人工智能/防空反导决策/dueling DQN/协同优化

Key words

hierarchical reinforcement learning/explainable artificial intelligence/air defense and anti-missile decision-making/dueling DQN/collaborative optimization

分类

航空航天

引用本文复制引用

刘宇衡,杨力,黄琦龙..基于可解释分层强化学习的防空反导策略优化[J].航空学报,2026,47(8):248-263,16.

基金项目

国家自然科学基金(U21B2003) National Natural Science Foundation of China(U21B2003) (U21B2003)

航空学报

1000-6893

访问量0
|
下载量0
段落导航相关论文