自动化学报2025,Vol.51Issue(6):1218-1232,15.DOI:10.16383/j.aas.c240481
基于梯度损失的离线强化学习算法
Gradient Loss for Offline Reinforcement Learning
摘要
Abstract
Offline reinforcement learning faces the core challenges of preventing distributional shifts and avoiding the overestimation of value functions.While the traditional TD3+BC algorithm achieves competitive performance by introducing behavioral cloning regularization to constrain the learned policy to be closer to the behavior policy,its policy stability during training still needs improvement.Especially in the real world,policy validation can be costly,making policy stability crucial.Inspired by the concept of"flat minima"in deep learning,this study aims to explore the flat regions of the target policy loss function in the action space to obtain a stable policy.To achieve this,a gradient loss function is proposed,and a new offline reinforcement learning algorithm called gradient loss for offline reinforcement learning(GLO)is designed.Experimental results on the D4RL benchmark dataset show that the GLO algorithm outperforms current mainstream algorithms.Furthermore,we extend our approach to the on-line reinforcement learning domain,demonstrating its generalizability and effectiveness in online reinforcement learning environments.关键词
强化学习/离线强化学习/平坦最小值/梯度最小化Key words
Reinforcement learning/offline reinforcement learning/flat minima/gradient minimization引用本文复制引用
陈鹏宇,刘士荣,段帅,端军红,刘扬..基于梯度损失的离线强化学习算法[J].自动化学报,2025,51(6):1218-1232,15.基金项目
国家自然科学基金(62071154,62173340)资助Supported by National Natural Science Foundation of China(62071154,62173340) (62071154,62173340)