华中科技大学学报(自然科学版)2025,Vol.53Issue(5):9-17,9.DOI:10.13245/j.hust.250078
基于动态势能奖励的双足机器人行走控制
Dynamic potential based rewards for learning bipedal locomotion control
摘要
Abstract
Aiming at the issues of insufficient exploration ability,low sample efficiency,and unstable walking mode in the learning process of legged robots,dynamic potential was integrated into the reward shaping based on potential energy,and a reward function based on dynamic potential reward shaping was proposed.The reward function dynamically adjusted the reward obtained by the robot's current action control during the training process,so as to improve the exploration ability of the learning process.In the virtual training environment of the legged robot,the proximal policy optimization algorithm(PPO)combined with the reward calculation based on the dynamic potential energy reward shaping was used to realize the fixed-speed walking control of the bipedal robot.Test results show that the proposed method can effectively accelerate the training process,and the motion of the robot is more natural and stable.关键词
深度强化学习/双足机器人行走控制/奖励塑造/动态势能/近端策略优化算法Key words
deep reinforcement learning/bipedal locomotion control/reward shaping/dynamic potential/proximal policy optimization algorithm分类
计算机与自动化引用本文复制引用
王泉德,王君豪,刘子航..基于动态势能奖励的双足机器人行走控制[J].华中科技大学学报(自然科学版),2025,53(5):9-17,9.基金项目
国家自然科学基金资助项目(62061160370). (62061160370)