自动化学报2026,Vol.52Issue(3):510-524,15.DOI:10.16383/j.aas.c250334
深度强化学习驱动的超视距空战自主决策方法
An Autonomous Decision-making Method for Beyond Visual Range Air Combat Driven by Deep Reinforcement Learning
摘要
Abstract
With the rapid development of airborne sensor technologies and medium-to-long-range air-to-air missile technologies,beyond visual range air combat has become the dominant form of modern air warfare.In such a com-plex and dynamic operational environment,the development of intelligent technologies capable of real-time battle-field situation awareness and rational maneuver decision-making has emerged as a research hotspot in the field of mi-litary technology.First,a high-fidelity simulation environment is constructed,encompassing a six-degree-of-freedom aircraft dynamics model,a missile guidance system model,and a radar sensor system.Subsequently,integrating im-itation learning and self-play methods,an opponent-learning-based air combat decision-making framework is pro-posed to address the poor adaptability and generalization of deep reinforcement learning in aerial combat,thereby en-hancing the agent's ability to rapidly adapt and optimize strategies in complex and variable battlefield environ-ments.Finally,ten expert systems with significant tactical differences are developed to engage in game-based confron-tations with the agent within the high-fidelity air combat simulation platform.The results demonstrate that the propo-sed decision-making framework significantly outperform traditional deep reinforcement learning strategies in key met-rics such as convergence speed and winning rate,exhibiting strong effectiveness and generalization.This work can pro-vide technical support for the rapid generation of reliable strategies in complex beyond visual range air combat scenarios.关键词
深度强化学习/对手学习/超视距空战/智能控制Key words
deep reinforcement learning/opponent learning/beyond visual range air combat/intelligent control引用本文复制引用
吕茂隆,王金河,韩浩然,丁晨博,万路军..深度强化学习驱动的超视距空战自主决策方法[J].自动化学报,2026,52(3):510-524,15.基金项目
国家自然科学基金(62303489,GKJJ24050502),博士后面上基金(2022M723877),博士后特别资助(2023T160790),中国博士后国际交流引进计划(YJ20220347),军事科技领域青年人才托举工程(2022-JCJQ-QT-018),陕西省自然科学基础研究计划重点项目(2025JC-QYCX-052)资助Supported by National Natural Science Foundation of China(62303489,GKJJ24050502),Postdoctoral General Fund(2022M723877),Special Postdoctoral Funding(2023T160790),China Postdoctoral International Exchange and Introduction Program(YJ20220347),Youth Talent Support Program for Milit-ary Science and Technology(2022-JCJQ-QT-018),and Key Pro-ject of the Natural Science Basic Research Program of Shaanxi Province(2025JC-QYCX-052) (62303489,GKJJ24050502)