| 注册
首页|期刊导航|西南交通大学学报|基于a-PPO算法的列检机械臂在线运动规划方法

基于a-PPO算法的列检机械臂在线运动规划方法

赵舵 谢冠豪 王叶文 赵文杰 黄晨 袁昭辉

西南交通大学学报2026,Vol.61Issue(1):167-177,11.
西南交通大学学报2026,Vol.61Issue(1):167-177,11.DOI:10.3969/j.issn.0258-2724.20240085

基于a-PPO算法的列检机械臂在线运动规划方法

Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm

赵舵 1谢冠豪 1王叶文 1赵文杰 1黄晨 1袁昭辉1

作者信息

  • 1. 西南交通大学电气工程学院,四川 成都 611756
  • 折叠

摘要

Abstract

To meet the needs of human-robot collaboration,where an inspection manipulator actively cooperates with a person under the railroad car and to enhance the convergence speed of the proximal policy optimization(PPO)algorithm,an adaptive PPO(a-PPO)algorithm was proposed and innovatively applied in the online motion planning of the inspection manipulator.Firstly,the system model was designed to immediately output policy actions based on the current environmental state.Secondly,geometric reinforcement learning was introduced to construct the reward function,utilizing the agent's exploration to continuously optimize the distribution of rewards.Thirdly,the clipping value was adaptively determined based on the policy similarity between before and after the update,and the a-PPO algorithm was developed.Finally,the improvement effects of the a-PPO algorithm were compared on two-dimensional maps,and the feasibility and effectiveness of its application were experimentally verified in both simulation and real train scenarios.The results indicate that in the two-dimensional plane simulation,the a-PPO algorithm shows certain advantages in convergence speed compared to other PPO algorithms.Additionally,the stability of paths has been improved,with the average length standard deviation being 16.786%lower than that of the PPO algorithm and 66.179%lower than that of the Informed-RRT* algorithm.In the application experiments in both simulated and real train scenarios,the manipulator demonstrates the capability to dynamically adjust target points and actively avoid dynamic obstacles during motion,reflecting its adaptability to dynamic environments.

关键词

强化学习/深度学习/运动规划/机械臂/轨道列车

Key words

reinforcement learning/deep learning/motion planning/manipulator/railroad car

分类

信息技术与安全科学

引用本文复制引用

赵舵,谢冠豪,王叶文,赵文杰,黄晨,袁昭辉..基于a-PPO算法的列检机械臂在线运动规划方法[J].西南交通大学学报,2026,61(1):167-177,11.

基金项目

国家自然科学基金项目(62173279,U1934221) (62173279,U1934221)

西南交通大学学报

0258-2724

访问量0
|
下载量0
段落导航相关论文