首页|期刊导航|智能系统学报|基于自适应分位数的离线强化学习算法

基于自适应分位数的离线强化学习算法

周娴玮王宇翔罗仕鑫余松森

智能系统学报2025，Vol.20Issue(5)：1093-1102,10.

智能系统学报2025，Vol.20Issue(5)：1093-1102,10.DOI:10.11992/tis.202410016

基于自适应分位数的离线强化学习算法

Offline reinforcement learning with adaptive quantile

周娴玮 ¹王宇翔 ¹罗仕鑫 ¹余松森¹

作者信息

1. 华南师范大学人工智能学院,广东佛山 528225
折叠

摘要

Abstract

Offline reinforcement learning aims to reduce the high cost of environmental interaction by learning effective policies solely from precollected offline datasets.However,the absence of interactive feedback can cause a distribution shift between the learned policy and the offline dataset,leading to increased extrapolation errors.Most existing methods address this problem using policy constraints or imitation learning,but they often result in overly conservative policies.To address the above problems,an adaptive quantile-based method is proposed.Building upon dual Q-estimation,the re-lationship between dual Q-estimates is further analyzed,using their differences to assess overestimation in out-of-distri-bution actions.The quantile is then adaptively adjusted to correct bias overestimation.Additionally,a quantile advant-age is introduced,which serves as a weight for the policy constraint term,balancing exploration and imitation to reduce policy conservativeness.Finally,the proposed approach is validated on the D4RL dataset,where it achieves excellent performance across multiple tasks,showing its potential for broad application in various scenarios.

关键词

离线强化学习/分布偏移/外推误差/策略约束/模仿学习/双Q估计/价值高估/分位数

Key words

offline reinforcement learning/distribution shift/extrapolation error/policy constraint/imitation learning/double Q-estimation/overestimation/quantile

分类

信息技术与安全科学

引用本文复制引用

周娴玮,王宇翔,罗仕鑫,余松森..基于自适应分位数的离线强化学习算法[J].智能系统学报,2025,20(5):1093-1102,10.

基金项目

广东省应用型科技研发重大专项(2016B020244003) （2016B020244003）

广东省企业科技特派员项目(GDKTP2020014000) （GDKTP2020014000）

广东省基础与应用基础研究基金项目(2020B1515120089,2020A1515110783). （2020B1515120089,2020A1515110783）

智能系统学报

OA北大核心

ISSN：1673-4785

访问量1

下载量0

段落导航