自动化学报(英文版)2024,Vol.11Issue(12):2497-2511,15.DOI:10.1109/JAS.2024.124494
Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
摘要
关键词
Offline reinforcement learning/off-policy QL-style/on-policy SARSA-style/policy evaluation(PE)/Q-value estimationKey words
Offline reinforcement learning/off-policy QL-style/on-policy SARSA-style/policy evaluation(PE)/Q-value estimation引用本文复制引用
Shuo Cao,Xuesong Wang,Yuhu Cheng..Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation[J].自动化学报(英文版),2024,11(12):2497-2511,15.基金项目
This work was supported in part by the National Natural Science Foundation of China(62176259,62373364)and the Key Research and Development Program of Jiangsu Province(BE2022095). (62176259,62373364)