Robust Offline Actor-Critic With On-policy Regularized Policy EvaluationOACSTPCDEI
Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
Shuo Cao;Xuesong Wang;Yuhu Cheng
Engineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,ChinaEngineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,ChinaEngineering Research Center of Intelligent Control for Underground Space,Ministry of Education,and the School of Infor-mation and Control Engineering,China University of Mining and Tech-nology,Xuzhou 221116,China
Offline reinforcement learningoff-policy QL-styleon-policy SARSA-stylepolicy evaluation(PE)Q-value estimation
Offline reinforcement learningoff-policy QL-styleon-policy SARSA-stylepolicy evaluation(PE)Q-value estimation
《自动化学报(英文版)》 2024 (12)
2497-2511,15
This work was supported in part by the National Natural Science Foundation of China(62176259,62373364)and the Key Research and Development Program of Jiangsu Province(BE2022095).
评论