| 注册
首页|期刊导航|控制理论与应用|基于深度强化学习的水下无人航行器高速目标捕获路径规划

基于深度强化学习的水下无人航行器高速目标捕获路径规划

庞舟岐 郝程鹏 林晓波 潘光帅

控制理论与应用2025,Vol.42Issue(10):1968-1980,13.
控制理论与应用2025,Vol.42Issue(10):1968-1980,13.DOI:10.7641/CTA.2025.40272

基于深度强化学习的水下无人航行器高速目标捕获路径规划

High speed target acquisition path planning for underwater unmanned vehicles based on deep reinforcement learning

庞舟岐 1郝程鹏 2林晓波 2潘光帅2

作者信息

  • 1. 中国科学院声学研究所,北京 100190||中国科学院大学,北京 100049
  • 2. 中国科学院声学研究所,北京 100190
  • 折叠

摘要

Abstract

There are many challenges in high-speed underwater target acquisition.On the one hand,sonar detection data is delayed and uncertain due to the changeable underwater environment,which makes high-precision target acquisition tasks full of challenges;On the other hand,the intercepting vehicle is unable to capture in a pursuit attitude due to the high speed of the target,greatly reducing the number of interceptable trajectories.Based on this,this article proposed an improved twin delayed deep deterministic policy gradient algorithm(ITD3)to improve the acquisition efficiency and accuracy.Firstly,based on the dynamics of the intercepting vehicle,this paper proposed a"planner-controller"cascaded simulation method,which was more accurate than pure kinematic simulation and more in line with the actual situation compared to the IGC model;Secondly,in order to solve the problems of large action space and delayed underwater sensors,this paper proposed an action mask mechanism and exploring noise based on delayed messages;Thirdly,in order to make the reward function fit the characteristics of high-speed target acquisition task,this paper designed a new reward function to punish states which were not conducive to capture;Finally,in order to improve the convergence speed and stability of the algorithm,this paper combined priority experience replay and softmax operator with the TD3 algorithm.Simulation experiments and hardware-in-the-loop simulations showed that compared with traditional acquisition algorithms,the feasible ITD3 algorithm proposed in this paper had a shorter interception time and a lower miss rate.

关键词

深度强化学习/确定性策略梯度/高速目标捕获/水下无人航行器/马尔可夫决策过程

Key words

deep reinforcement learning/deterministic policy gradient/high speed target acquisition/underwater un-manned vehicle/Markov decision process

引用本文复制引用

庞舟岐,郝程鹏,林晓波,潘光帅..基于深度强化学习的水下无人航行器高速目标捕获路径规划[J].控制理论与应用,2025,42(10):1968-1980,13.

基金项目

国家自然科学基金项目(61971412),中国科学院某实验室基金项目(CXJJ-22S025)资助.Supported by the National Natural Science Foundation of China(61971412)and the Fund of a Laboratory of the Chinese Academy of Sciences(CXJJ-22S025). (61971412)

控制理论与应用

OA北大核心

1000-8152

访问量0
|
下载量0
段落导航相关论文