首页|期刊导航|自动化学报|基于表征学习的离线强化学习方法研究综述

基于表征学习的离线强化学习方法研究综述OA北大核心CSTPCD

A Review of Offline Reinforcement Learning Based on Representation Learning

中文摘要

英文摘要

强化学习(Reinforcement learning,RL)通过智能体与环境在线交互来学习最优策略,近年来已成为解决复杂环境下感知决策问题的重要手段.然而,在线收集数据的方式可能会引发安全、时间或成本等问题,极大限制了强化学习在实际中的应用.与此同时,原始数据的维度高且结构复杂,解决复杂高维数据输入问题也是强化学习面临的一大挑战.幸运的是,基于表征学习的离线强化学习能够仅从历史经验数据中学习策略,而无需与环境产生交互.它利用表征学习技术将离线数据集中的特征表示为低维向量,然后利用这些向量来训练离线强化学习模型.这种数据驱动的方式为实现通用人工智能提供了新契机.为此,对近期基于表征学习的离线强化学习方法进行全面综述.首先给出离线强化学习的形式化描述,然后从方法、基准数据集、离线策略评估与超参数选择3个层面对现有技术进行归纳整理,进一步介绍离线强化学习在工业、推荐系统、智能驾驶等领域中的研究动态.最后,对全文进行总结,并探讨基于表征学习的离线强化学习未来所面临的关键挑战与发展趋势,以期为后续的研究提供有益参考.

Reinforcement learning(RL),learning an optimal policy through online interaction between an agent and environment,has recently become an important tool to solve perceptual decision-making issues in complex en-vironments.However,the online data collection may raise issues of security,time,or cost,greatly limiting the prac-tical applications of reinforcement learning.Meanwhile,tackling intricate high-dimensional data input problems has also become a significant challenge for reinforcement learning due to the intricate and multifaceted nature of raw data.Fortunately,offline reinforcement learning based on representation learning can learn the policy only from his-torical experience data without interacting with the environment.It utilizes representation learning techniques to map the features of the offline dataset into low-dimensional vectors,which are subsequently employed to train the offline reinforcement learning model.This data-driven paradigm provides a new opportunity to realize the general artificial intelligence.To this end,this paper comprehensively reviews the recent research on offline reinforcement learning based on representation learning.Firstly,the problem setup of offline reinforcement learning is given.Then,the existing technologies are summarized from three aspects:Methodologies,benchmarks,offline policy evaluation and hyperparameter selection.Moreover,the study trends of offline reinforcement learning in industries,recom-mendation systems,intelligent driving,and other fields are introduced.Finally,the conclusion is drawn and the key challenges and development trends of offline reinforcement learning based on representation learning in the future are discussed,so as to provide a valuable reference for subsequent study.

作者：王雪松;王荣荣;程玉虎

作者单位：中国矿业大学信息与控制工程学院徐州 221116

中文关键词：强化学习离线强化学习表征学习历史经验数据分布偏移

英文关键词：Reinforcement learning(RL)offline reinforcement learningrepresentation learninghistorical experi-ence datadistribution shift

刊名：《自动化学报》 2024 (006)

页码/页数：1104-1128 / 25

基金：国家自然科学基金(62373364,62176259),江苏省重点研发计划项目(BE2022095)资助 Supported by National Natural Science Foundation of China(62373364,62176259)and Key Research and Development Pro-gram of Jiangsu Province(BE2022095)

DOI：10.16383/j.aas.c230546

基于表征学习的离线强化学习方法研究综述OA北大核心CSTPCD

A Review of Offline Reinforcement Learning Based on Representation Learning

评论