| 注册
首页|期刊导航|华中科技大学学报(自然科学版)|基于规则操作性PPO的无人船避碰方法研究

基于规则操作性PPO的无人船避碰方法研究

邓明星 李爽 曾晨 许小伟

华中科技大学学报(自然科学版)2026,Vol.54Issue(4):94-101,8.
华中科技大学学报(自然科学版)2026,Vol.54Issue(4):94-101,8.DOI:10.13245/j.hust.250367

基于规则操作性PPO的无人船避碰方法研究

Study on unmanned surface vehicle collision avoidance method based on rule and maneuverability PPO

邓明星 1李爽 2曾晨 3许小伟1

作者信息

  • 1. 武汉科技大学汽车与交通工程学院,湖北 武汉 430065||新能源汽车先进底盘技术湖北省工程研究中心,湖北 武汉 430065
  • 2. 武汉科技大学汽车与交通工程学院,湖北 武汉 430065
  • 3. 中国舰船研究设计中心,湖北 武汉 430060
  • 折叠

摘要

Abstract

To address the curse of dimensionality and the challenges of dynamic adaptability in unmanned surface vehicle collision avoidance path planning under complex port environments,a deep reinforcement learning-based collision-avoidance strategy that integrated the international regulations for preventing collisions at sea(COLREGs)and ship maneuverability was proposed,namely RM-PPO.By constructing a reward function framework incorporating rule-based constraints,a dynamically optimized objective system was established for head-on,crossing,and overtaking encounter scenarios,ensuring that the collision-avoidance decisions complied with the maneuvering requirements of COLREGs under different encounter situations.Based on a three-degree-of-freedom ship dynamics model,a continuous action space was designed to achieve the coordinated optimization of propulsion power and rudder angle commands,thereby enhancing the physical realizability of the actions.The network architecture of the proximal policy optimization(PPO)algorithm was improved by introducing a gated recurrent unit(GRU)to enhance temporal decision-making capability,and a rule-flag-based policy generation mechanism was proposed to balance the exploration and stability of the policy.Simulation and experimental results show that the proposed RM-PPO algorithm is capable of achieving efficient and safe collision avoidance in both basic encounter scenarios and complex environments,with significantly superior performance in terms of path length,collision-avoidance distance,and navigation stability compared to conventional reinforcement learning algorithms.

关键词

无人船避碰/深度强化学习/避碰规则/动态探索/PPO算法

Key words

unmanned surface vehicle collision avoidance/deep reinforcement learning/collision avoidance rules/dynamic exploration/PPO algorithm

分类

信息技术与安全科学

引用本文复制引用

邓明星,李爽,曾晨,许小伟..基于规则操作性PPO的无人船避碰方法研究[J].华中科技大学学报(自然科学版),2026,54(4):94-101,8.

基金项目

国家重点研发计划资助项目(2022YFE0125200) (2022YFE0125200)

国家自然科学基金资助项目(52575134):武汉市自然科学基金特区计划资助项目(2024040701010056) (52575134)

国防基础科研资助项目(JCKY2023206A023). (JCKY2023206A023)

华中科技大学学报(自然科学版)

1671-4512

访问量5
|
下载量0
段落导航相关论文