自动化学报2024,Vol.50Issue(6):1104-1128,25.DOI:10.16383/j.aas.c230546
基于表征学习的离线强化学习方法研究综述
A Review of Offline Reinforcement Learning Based on Representation Learning
摘要
Abstract
Reinforcement learning(RL),learning an optimal policy through online interaction between an agent and environment,has recently become an important tool to solve perceptual decision-making issues in complex en-vironments.However,the online data collection may raise issues of security,time,or cost,greatly limiting the prac-tical applications of reinforcement learning.Meanwhile,tackling intricate high-dimensional data input problems has also become a significant challenge for reinforcement learning due to the intricate and multifaceted nature of raw data.Fortunately,offline reinforcement learning based on representation learning can learn the policy only from his-torical experience data without interacting with the environment.It utilizes representation learning techniques to map the features of the offline dataset into low-dimensional vectors,which are subsequently employed to train the offline reinforcement learning model.This data-driven paradigm provides a new opportunity to realize the general artificial intelligence.To this end,this paper comprehensively reviews the recent research on offline reinforcement learning based on representation learning.Firstly,the problem setup of offline reinforcement learning is given.Then,the existing technologies are summarized from three aspects:Methodologies,benchmarks,offline policy evaluation and hyperparameter selection.Moreover,the study trends of offline reinforcement learning in industries,recom-mendation systems,intelligent driving,and other fields are introduced.Finally,the conclusion is drawn and the key challenges and development trends of offline reinforcement learning based on representation learning in the future are discussed,so as to provide a valuable reference for subsequent study.关键词
强化学习/离线强化学习/表征学习/历史经验数据/分布偏移Key words
Reinforcement learning(RL)/offline reinforcement learning/representation learning/historical experi-ence data/distribution shift引用本文复制引用
王雪松,王荣荣,程玉虎..基于表征学习的离线强化学习方法研究综述[J].自动化学报,2024,50(6):1104-1128,25.基金项目
国家自然科学基金(62373364,62176259),江苏省重点研发计划项目(BE2022095)资助 Supported by National Natural Science Foundation of China(62373364,62176259)and Key Research and Development Pro-gram of Jiangsu Province(BE2022095) (62373364,62176259)