电子科技2025,Vol.38Issue(8):11-18,8.DOI:10.16180/j.cnki.issn1007-7820.2025.08.002
基于平滑探索的倒立摆虚实迁移学习控制方法
Smooth Exploration-Based Control Method for Inverted Pendulum Virtual-Reality Migration Learning
摘要
Abstract
The nonlinear and underactuated nature of the inverted pendulum makes it a benchmark test case for RL(Reinforcement Learning)algorithms.When the simulation-learned RL strategy is deployed to the physical plat-form,the control signal has mutations and oscillations,which leads to the failure of the strategy deployment,and the problems of high power consumption,excessive system wear and hardware damage.To solve this problem,a regulari-zation term for smoothing exploration of RL strategy is proposed in this study.In order to solve the policy mutation problem in the physical deployment stage,the mutation regularization term is designed to constrain the policy muta-tion in the exploration stage.Oscillation regularization term is designed to solve the small-range oscillation problem of the strategy,and the value functions of similar states are constrained.The smooth exploration regularization term is applied to the PPO(Proximal Policy Optimization)algorithm to carry out the virtual real transfer experiment of invert-ed pendulum.The experimental results show that the training speed of PPO algorithm for smooth exploration is in-creased by 40%in simulation,and the virtual-real transfer is successfully realized,which has strong smoothness and robustness.关键词
倒立摆/强化学习/平滑探索/突变正则化项/震荡正则化项/近端策略优化算法/PPO算法/虚实迁移Key words
inverted pendulum/reinforcement learning/smooth exploration/mutation regularization term/oscilla-tion regularization term/proximal policy optimization algorithm/PPO algorithm/virtual-to-real migration分类
信息技术与安全科学引用本文复制引用
皇甫嘉琪,薛杰,牟海明,李清都..基于平滑探索的倒立摆虚实迁移学习控制方法[J].电子科技,2025,38(8):11-18,8.基金项目
国家自然科学基金(92048205) National Natural Science Foundation of China(92048205) (92048205)