| 注册
首页|期刊导航|中北大学学报(自然科学版)|基于奖励与策略双优化的机械臂控制算法

基于奖励与策略双优化的机械臂控制算法

申珅 曾建潮 秦品乐

中北大学学报(自然科学版)2023,Vol.44Issue(6):616-623,8.
中北大学学报(自然科学版)2023,Vol.44Issue(6):616-623,8.DOI:10.3969/j.issn.1673-3193.2023.06.005

基于奖励与策略双优化的机械臂控制算法

Control Algorithm of Manipulator Based on Reward and Policy Double Optimization

申珅 1曾建潮 2秦品乐3

作者信息

  • 1. 中北大学电气与控制工程学院,山西太原 030051
  • 2. 中北大学电气与控制工程学院,山西太原 030051||中北大学计算机科学与技术学院,山西太原 030051
  • 3. 中北大学计算机科学与技术学院,山西太原 030051
  • 折叠

摘要

Abstract

When reinforcement learning is applied in the field of intelligent control of manipulator,it can achieve the autonomous perception and decision-making function of manipulator by training the control-ler based on the free exploration of the environment and the reward value of the environment feedback.However,unconstrained free exploration will lead to ineffective action,which leads to long training time and slow convergence.To solve the above problems,a Hybird Reward Generative Adversarial Imitation Learning(HR-GAIL)based on reward and policy dual optimization was proposed.In terms of reward,based on the improved discriminator,a compound reward function was constructed by combining task reward and imitation reward.In terms of policy,a binary variable loss function was constructed by com-bining discriminator and policy network,and the controller was updated in the process of alternating re-ward and policy optimization.Finally,in the Pybullet environment,a Panda arm was built to carry out the simulation task of grabbing and moving objects to verify the effect of the proposed algorithm.The simulation results show that under the same simulation task,the completion time of HR-GAIL is 16%shorter than that of GAIL+SAC,the grasping success rate is 5%higher,and the training discriminator speed and grasping stability are improved.

关键词

强化学习/奖励函数/机械臂控制/算法优化

Key words

reinforcement learning/reward function/manipulator control/algorithm optimization

分类

信息技术与安全科学

引用本文复制引用

申珅,曾建潮,秦品乐..基于奖励与策略双优化的机械臂控制算法[J].中北大学学报(自然科学版),2023,44(6):616-623,8.

中北大学学报(自然科学版)

1673-3193

访问量0
|
下载量0
段落导航相关论文