基于HER-TD3算法的青皮核桃采摘机械臂路径规划OA北大核心CSTPCD
Path Planning of Green Walnut Picking Robotic Arm Based on HER-TD3 Algorithm
针对青皮核桃和树枝等障碍物无序生长导致机械臂采摘环境复杂、训练任务量大、稳定性差等普遍存在的问题,本文设计了一种同步带模组与机械臂协作的采摘装置,并采用基于事后经验回放的双延迟深度确定性策略梯度算法(Twin delayed deep deterministic policy gradient with hindsight experience replay,HER-TD3)对采摘机械臂进行路径规划,通过HER算法提高智能体的探索能力,缓解稀疏奖励的问题;通过TD3算法提高智能体的稳定性,减少了训练中出现的震荡现象.为了证明HER-TD3算法的可行性和泛化能力,引入TD3、HER-DDPG算法进行对比,采用降维训练方法对3种深度强化学习智能体进行训练,结果表明HER-TD3算法模型在完成路径规划任务中成功率达到98%,与HER-DDPG算法相比提高4个百分点,与TD3算法相比提高19个百分点;在CoppeliaSim软件中搭建三维模型仿真环境,设计初始姿态和碰撞检测,使用YOLO v4识别青皮核桃,通过该算法模型能够引导虚拟采摘机械臂避开树枝障碍物达到目标位置,完成无碰撞路径规划,无障碍物和有障碍物时路径规划成功率分别为91%和86%;利用物理样机进行青皮核桃采摘试验时,仍能较好地完成路径规划任务,无障碍物时采摘路径规划成功率为86.7%,平均运动时间为12.8 s,有障碍物时采摘路径规划成功率为80.0%,平均运动时间为13.6s,验证了 HER-TD3算法对复杂环境具有较好的适应性和稳定性.
In response to the common problems of complex environments,large training tasks,and poor stability caused by the disorder growth of green walnut and tree branches,etc.,a harvesting device based on synchronous belt module and manipulator was designed,and the path planning of harvesting manipulator was carried out by using the twin delayed deep deterministic policy gradient with hindsight experience replay(HER-TD3)algorithm.HER algorithm was used to improve the agent's ability of exploration and alleviate the problem of sparse reward,and TD3 algorithm was used to improve the agent's stability and reduce the oscillation in training.In order to demonstrate the feasibility and generalization ability of the HER-TD3 algorithm,TD3 and HER-DDPG algorithms were introduced for comparison.Three deep reinforcement learning agents were trained by using dimensionality reduction training methods.The results showed that the success rate of the HER-TD3 algorithm model in completing path planning tasks reached 98%,which was 4 percentage points higher than that of the HER-DDPG algorithm and 19 percentage points higher than that of TD3.The 3D model simulation environment was built in CoppeliaSim software,and the initial attitude and collision detection were designed,YOLO v4 was used to recognize green walnuts,and used this algorithm model to guide the virtual harvesting robotic arm to avoid tree branches and obstacles to reach the target position,completing collision free path planning.The success rates of path planning were 91%in the absence of obstacles and 86%in the presence of obstacles.In the experiment of picking green walnut using a physical prototype,the path planning task was still well completed.The success rate of path planning for harvesting without obstacles was 86.7%,with an average motion time of 12.8 s,while the success rate in the presence of obstacles was 80.0%,with an average motion time of 13.6 s.It was verified that HER-TD3 algorithm had good adaptability and stability to complex environment.
杨淑华;谢晓波;邴振凯;郝建军;张秀花;袁大超
河北农业大学机电工程学院,保定 071001||河北省智慧农业装备技术创新中心,保定 071001河北农业大学机电工程学院,保定 071001
农业工程
青皮核桃采摘机器人机械臂HER-TD3算法路径规划
green walnutpicking robotrobotic armHER-TD3 algorithmpath planning
《农业机械学报》 2024 (004)
113-123 / 11
河北省重点研发计划项目(21327211D)和河北省博士研究生创新能力培养项目(CXZZBS2022050)
评论