

Research on PPO Algorithm Design for UAV Swarm Reconnaissance and Strike Scenarios


无人机集群决策问题是智能化战争的重要研究方向.以构建的典型无人机集群侦察打击的任务场景为例,研究复杂不确定条件下的无人机集群任务分配与运动规划问题.针对该问题,从战场环境模型参数化设计与典型集群侦察打击任务角度,阐述任务决策的复杂性与战场环境不确定性.设计通用性较强的状态空间、奖励函数、动作空间和策略网络,其中,为捕捉多元态势信息,设计并处理了多种类型特征作为状态空间,同时设计与察打任务紧密相关的多种类型奖励;动作策略输出采取主谓宾的形式,更好表达复杂操作;策略网络设计了编码器-时序聚合-注意力机制-解码器结构,充分融合特征信息,促进了训练效果.基于近端策略优化算法(proximal policy optimization,PPO)的深度强化学习(deep reinforcement learning,DRL)进行求解.最后,通过仿真环境实验验证了无人机集群在复杂不确定条件下实现察打任务决策的可行性和有效性,展现了集群任务分配与运动规划的智能性.

The problem of d ecision-making of UAV swarms is an important research direction of intelligent warfare.Taking the mission scenario of the constructed typical UAV swarm reconnaissance and strike as an example,the task allocation and motion planning of UAV swarm under the complex and uncertain conditions are studied.In order to solve this problem,the complexity of mission decision-making and the uncertainty of battlefield environment are first elaborated from the perspective of parametric design of battlefield environment model and typical swarm reconnaissance and strike mission.Then,a state space,reward function,action space and strategy network with strong versatility is designed.First,types of features are designed and processed as state space in order to capture multiple situation information.Multiple types of rewards closely related to the reconnaissance and strike task are designed at the same time.Moreover,the output of action strategy takes the form of subject-verb-object to better express the complex operations.The encoder-time series aggregation-attention mechanism-decoder structure is designed for the strategy network,which fully integrates the feature information and promotes the training effects.Then it is solved by Deep Reinforcement Learning(DRL)based on Proximal Policy Optimization(PPO).Finally,the feasibility and effectiveness of UAV swarm to realize reconnaissanc e and strike mission decision-making under the complex and uncertain conditions are verified through simulation environment experiments,meanwhile the intelligence of swarm task allocation and motion planning is demonstrated.


北方自动控制技术研究所,太原 030006



proximal policy optimization designtask assignmentmotion planningreconnaissance and strikecollaborative decision-making

《火力与指挥控制》 2024 (003)

25-34 / 10

