首页|期刊导航|自动化学报(英文版)|Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation

Robust Offline Actor-Critic With On-policy Regularized Policy EvaluationOACSTPCDEI

Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation

Shuo Cao;Xuesong Wang;Yuhu Cheng

Engineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,ChinaEngineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,ChinaEngineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,China

Offline reinforcement learningoff-policy QL-styleon-policy SARSA-stylepolicy evaluation(PE)Q-value estimation

Offline reinforcement learningoff-policy QL-styleon-policy SARSA-stylepolicy evaluation(PE)Q-value estimation

《自动化学报(英文版)》 2024 (12)

2497-2511,15

This work was supported in part by the National Natural Science Foundation of China(62176259,62373364)and the Key Research and Development Program of Jiangsu Province(BE2022095).

10.1109/JAS.2024.124494

评论