|国家科技期刊平台
首页|期刊导航|航空工程进展|基于态势评估及DDPG算法的一对一空战格斗控制方法

基于态势评估及DDPG算法的一对一空战格斗控制方法OACSTPCD

One-on-one air combat control method based on situation assessment and DDPG algorithm

中文摘要英文摘要

已有的空中格斗控制方法未综合考虑基于专家知识的态势评估及通过连续性速度变化控制空战格斗的问题.基于深度确定性策略梯度(DDPG)强化学习算法,在态势评估函数作为强化学习奖励函数的基础上,设计综合考虑飞行高度上下限、飞行过载以及飞行速度上下限的强化学习环境;通过全连接的载机速度控制网络与环境奖励网络,实现DDPG算法与学习环境的交互,并根据高度与速度异常、被导弹锁定时间以及格斗时间设计空战格斗结束条件;通过模拟一对一空战格斗,对该格斗控制方法在环境限制学习、态势评估得分以及格斗模式学习进行验证.结果表明:本文提出的空战格斗控制方法有效,能够为自主空战格斗进一步发展提供指导.

The existing aerial combat control methods do not comprehensively consider the situation assessment based on expert knowledge and the control of aerial combat through continuous speed change.Based on the deep de-terministic policy gradient(DDPG)reinforcement learning algorithm,a comprehensive reinforcement learning en-vironment is designed that considers flight altitude limits,flight overload and flight speed limits,which is building upon the situation evaluation function as the reward function for reinforcement learning.The interaction between the DDPG algorithm and learning environment is achieved through the fully connected carrier speed control network and the environment reward network.The end condition for air combat is designed based on abnormal height and speed,missile lock time and combat time.By simulating one-on-one air combat,the effectiveness of this combat control method is validated in terms of learning under environmental constraints,situation evaluation scores and combat mode learning.The results show that the air combat control method is effective,and can provide guidance for the further development of autonomous air combat.

贺宝记;白林亭;文鹏程

航空工业西安航空计算技术研究所 人工智能与图形图像研究室,西安 710076

强化学习态势评估深度确定性策略梯度空战格斗

reinforcement learningsituation assessmentdeep deterministic policy gradientair combat

《航空工程进展》 2024 (002)

179-187 / 9

10.16615/j.cnki.1674-8190.2024.02.20

评论