| 注册
首页|期刊导航|自动化学报|基于梯度损失的离线强化学习算法

基于梯度损失的离线强化学习算法

陈鹏宇 刘士荣 段帅 端军红 刘扬

自动化学报2025,Vol.51Issue(6):1218-1232,15.
自动化学报2025,Vol.51Issue(6):1218-1232,15.DOI:10.16383/j.aas.c240481

基于梯度损失的离线强化学习算法

Gradient Loss for Offline Reinforcement Learning

陈鹏宇 1刘士荣 1段帅 1端军红 2刘扬1

作者信息

  • 1. 哈尔滨工业大学计算学部 哈尔滨 150001
  • 2. 空军工程大学防空反导学院 西安 710051
  • 折叠

摘要

Abstract

Offline reinforcement learning faces the core challenges of preventing distributional shifts and avoiding the overestimation of value functions.While the traditional TD3+BC algorithm achieves competitive performance by introducing behavioral cloning regularization to constrain the learned policy to be closer to the behavior policy,its policy stability during training still needs improvement.Especially in the real world,policy validation can be costly,making policy stability crucial.Inspired by the concept of"flat minima"in deep learning,this study aims to explore the flat regions of the target policy loss function in the action space to obtain a stable policy.To achieve this,a gradient loss function is proposed,and a new offline reinforcement learning algorithm called gradient loss for offline reinforcement learning(GLO)is designed.Experimental results on the D4RL benchmark dataset show that the GLO algorithm outperforms current mainstream algorithms.Furthermore,we extend our approach to the on-line reinforcement learning domain,demonstrating its generalizability and effectiveness in online reinforcement learning environments.

关键词

强化学习/离线强化学习/平坦最小值/梯度最小化

Key words

Reinforcement learning/offline reinforcement learning/flat minima/gradient minimization

引用本文复制引用

陈鹏宇,刘士荣,段帅,端军红,刘扬..基于梯度损失的离线强化学习算法[J].自动化学报,2025,51(6):1218-1232,15.

基金项目

国家自然科学基金(62071154,62173340)资助Supported by National Natural Science Foundation of China(62071154,62173340) (62071154,62173340)

自动化学报

OA北大核心

0254-4156

访问量0
|
下载量0
段落导航相关论文