基于深度强化学习的舰船导弹目标分配方法OA北大核心CSTPCD
Missile-target assignment method of naval ship based on deep reinforcement learning
针对对抗环境下的海上舰船防空反导导弹目标分配问题,本文提出了一种融合注意力机制的深度强化学习算法.首先,构建了舰船多类型导弹目标分配模型,并结合目标多波次拦截特点将问题建模为马尔可夫决策过程.接着,基于编码器-解码器框架搭建强化学习策略网络,融合多头注意力机制对目标进行编码,并在解码中结合整体目标和单个目标编码信息实现舰船可靠的导弹目标分配.最后,对导弹目标分配收益、分配时效以及策略网络训练过程进行了仿真实验.实验结果表明,本文方法能生成高收益的导弹目标分配方案,相较于对比算法的大规模决策计算速度提高10%~94%,同时其策略网络能够快速稳定收敛.
To effectively solve the missile-target allocation problem of the naval ship in the case of confrontation,this study proposes a deep reinforcement learning algorithm combining attention mechanism.First,we construct a mathematical model for multi-type missiles of the naval ship and design the Markov decision-making process considering the situation of multi-times target interception.After that,the policy network is constructed based on the encoder-decoder architecture,in which targets are encoded combined with the multi-head attention mechanism and the reasonable missile-target allocation scheme is generated in the decoder according to integrated global and local embedding information.Finally,we conduct simulation experiments are carried out on the profit of missile-target allocation schemes,computation time,and the training process of the policy network.The experimental results show that our algorithm can engender missile-target allocation schemes with higher profit compared to baselines,the computation time in large-scale problems is reduced by 10%~94%,and it converges fast and stably.
肖友刚;金升成;毛晓;伍国华;陆志沣
中南大学交通运输工程学院,湖南长沙 410018上海机电工程研究所,上海 201109
防空反导导弹目标分配武器目标分配深度强化学习
air defense and anti-missilemissile-target allocationweapon-target allocationdeep reinforcement learn-ing
《控制理论与应用》 2024 (006)
990-998 / 9
评论