自动化学报2025,Vol.51Issue(10):2245-2255,11.DOI:10.16383/j.aas.c250156
基于无模型策略梯度强化学习的未知随机系统最优控制
Model-free Policy Gradient-based Reinforcement Learning Algorithms for Optimal Control of Unknown Stochastic Systems
摘要
Abstract
This paper investigates the optimal control problem of a class of Markov stochastic jump systems(MSJSs)with unknown dynamics by two novel model-free policy gradient(PG)-based reinforcement learning(RL)algorithms.Firstly,for MSJSs with partially unknown model information,an analytical form of model-free PG is derived based on the sampling data of MSJSs and the solutions to coupled Lyapunov equations,and a partially model-free PG-based RL optimal control algorithm is proposed,where the predefined performance index is directly minimized.As the fact that the necessary data for solving the coupled Lyapunov equations and calculating the PG can be extracted from the same trajectory of the system sampling data,without the need to collect additional sampling data,the sampling complexity of the algorithm is significantly reduced.Furthermore,in order to com-pletely eliminate the dependence on the model information of MSJSs,the PG is estimated through random perturb-ation feedback gain,and a completely model-free PG-based RL algorithm is proposed to achieve optimal control of MSJSs with completely unknown dynamics.Finally,simulation results are presented to demonstrate the efficiency and superiority of the proposed two model-free PG-based RL optimal control algorithms.关键词
随机系统/最优控制/未知动力学/策略梯度/强化学习Key words
Stochastic systems/optimal control/unknown dynamics/policy gradient/reinforcement learning引用本文复制引用
杜城龙,韩洁,李繁飙,桂卫华..基于无模型策略梯度强化学习的未知随机系统最优控制[J].自动化学报,2025,51(10):2245-2255,11.基金项目
国家自然科学基金(62303492,62533005,62222317,62473383),湖南省自然科学基金(2025JJ40056,2023JJ40765),广东省基础与应用基础研究基金(2024A1515240069)资助Supported by National Natural Science Foundation of China(62303492,62533005,62222317,62473383),Natural Science Foundation of Hunan Province(2025JJ40056,2023JJ40765),and Guangdong Basic and Applied Basic Research Foundation(2024A1515240069) (62303492,62533005,62222317,62473383)