首页|期刊导航|实验技术与管理|基于深度强化学习的机械臂动态避障算法设计与实验验证

基于深度强化学习的机械臂动态避障算法设计与实验验证

冒建亮王展周昕夏飞张传林

实验技术与管理2025，Vol.42Issue(4)：78-85,8.

实验技术与管理2025，Vol.42Issue(4)：78-85,8.DOI:10.16791/j.cnki.sjg.2025.04.010

基于深度强化学习的机械臂动态避障算法设计与实验验证

Design and experimental verification of a dynamic obstacle avoidance algorithm for robot manipulators based on deep reinforcement learning

冒建亮 ¹王展 ¹周昕 ¹夏飞 ¹张传林¹

作者信息

1. 上海电力大学自动化工程学院,上海 200090
折叠

摘要

Abstract

[Objective]The study addresses the challenge of dynamic obstacle avoidance for robot manipulators operating in unstructured environments.Traditional motion planning algorithms often struggle with real-time adaptability and responsiveness to dynamic changes,especially in scenarios involving nonstatic obstacles and targets where the ability to adapt quickly and accurately is crucial for safe and efficient operation.Therefore,this research aims to develop an advanced algorithm based on deep reinforcement learning(DRL)that effectively balances dynamic obstacle avoidance and target tracking,ensuring the safe and efficient operation of robot manipulators in complex,unpredictable scenarios.[Methods]To achieve this goal,a DRL framework using the soft actor-critic(SAC)algorithm was designed.The SAC algorithm,known for its suitability in continuous control tasks,uses neural networks to handle high-dimensional tasks without requiring precise environment modeling.The robot manipulator learns optimal control strategies through trial-and-error interactions with the environment.The proposed method incorporates a comprehensive reward function that balances critical factors,including end-effector and body obstacle avoidance,self-collision prevention,precise target reaching,and motion smoothness.This comprehensive reward function guides the learning process by providing clear feedback signals that encourage the agent to develop efficient and safe behaviors.The state space provides a comprehensive representation of the environment,incorporating crucial details about the robot manipulator,obstacles,and target.It includes joint angles,joint velocities,end-effector positions and orientations,as well as key points on the manipulator's body.This holistic representation of the environment ensures that the agent has all the necessary information for making accurate and efficient decisions.The action space is defined by joint accelerations,which are transformed into planned joint velocities and communicated to the manipulator for control.This control strategy effectively eliminates motion singularities,enabling smooth and continuous operation.[Results]The algorithm is trained in a simulation environment that leverages Python and the PyBullet simulator,providing a realistic and efficient platform for agent training.This environment is encapsulated using the Gym framework and integrated with the Stable-Baselines3 library to facilitate smooth agent-environment interactions.Extensive simulations demonstrate the algorithm's ability to learn effective dynamic obstacle avoidance strategies,with average reward and success rate curves showing noticeable improvement and eventual stabilization.These results indicate that the model achieves a relatively stable state,capable of navigating complex and dynamic environments.The trained model is subsequently deployed on a real robot manipulator equipped with a visual servoing system.This setup includes a Realsense D435 camera and an Onrobot gripper attached to a UR5 manipulator.The visual servoing system employs ArUco markers for detecting obstacles and targets,while OpenCV handles image processing and pose estimation,enabling real-time environmental perception and precise manipulator control.Experimental results validate the algorithm's practical effectiveness,as the robot successfully avoids dynamic obstacles and reliably reaches target positions regardless of the direction of obstacle motion.Quantitative analysis reveals that the end-effector's position error with respect to the target converges to zero,and joint velocities remain smooth throughout the operation.These results validate the algorithm's precision and reliability.[Conclusions]This study successfully develops and validates a DRL-based dynamic algorithm for obstacle avoidance in robot manipulators.By utilizing the soft actor-critic algorithm and a well-structured reward function,the proposed method demonstrates superior performance in navigating complex,dynamic environments.Deployment of the trained model on a real robot manipulator,integrated with a visual servoing system,further validates the algorithm's practical applicability.These results highlight the potential of DRL in enhancing the autonomy and adaptability of robot manipulators,paving the way for future research in intelligent robotic systems.

关键词

机械臂/深度强化学习/动态避障/轨迹规划

Key words

robot manipulator/deep reinforcement learning/dynamic obstacle avoidance/trajectory planning

分类

信息技术与安全科学

引用本文复制引用

冒建亮,王展,周昕,夏飞,张传林..基于深度强化学习的机械臂动态避障算法设计与实验验证[J].实验技术与管理,2025,42(4):78-85,8.

基金项目

国家自然科学基金项目(62203292) （62203292）

实验技术与管理

OA北大核心

ISSN：1002-4956

访问量1

下载量0

段落导航