一种基于条件生成对抗网络的强化学习数据增强方法OACSTPCD
A Reinforcement Learning Data Augmentation Method Based on Conditional Generative Adversarial Networks
强化学习用于序列决策问题上取得的成功越来越受到人们的重视,但是当使用高维状态作为输入时,仍然存在数据效率低下的问题.造成这个问题的原因之一是智能体难以从高维空间提取有效的特征.为了提高数据效率,论文提出一种适用于强化学习任务的数据增强方法cGDA(cGANs-based Data Augment),该方法用条件生成对抗网络(cGANs)对环境的动态特性建模,以当前时刻的状态和动作作为条件生成模型的输入,输出下一时刻的状态作为增强数据.训练过程中使用真实数据和增强数据同时训练智能体,有效地帮助智能体从不同的数据中快速提取到有用的知识.在Atari100K基准上,cGDA在26个离散控制问题环境中与采用数据增强的方法比较,在16个环境中获得了更高的性能;与未采用数据增强的方法比较,在14个环境中获得了更高的性能.
More and more attention has been paid to the success of reinforcement learning in sequential decision making,but there is still a problem of low data efficiency when using high-dimensional state as input.One of the reasons for this problem is that it is difficult for an agent to extract effective features from a high-dimensional space.In order to improve data efficiency,this paper proposes a data augmentation method cGDA(cGANs-based Data Augment)suitable for reinforcement learning task.Conditional generative adversarial nets(cGANs)is used to model the dynamic characteristics of the environment,with the state and action at the current moment as the input of the conditional generation model.The model outputs the state of the next moment as augmented data.In the process of training,real data and augmented data are used to train agents,which can effectively help agents to extract useful knowledge from different data quickly.On the Atari100K benchmark,cGDA achieves higher performance in 16 of 26 discrete control problem environments compared with the methods with data augmentation.Higher performance is achieved in 14 environ-ments compared with the approach without data augmentation.
项宇;秦进;袁琳琳
贵州大学计算机科学与技术学院 贵阳 550025贵州开放大学信息工程学院 贵阳 550023
计算机与自动化
强化学习数据增强数据效率条件生成对抗网络雅达利游戏
reinforcement learningdata augmentationdata efficiencyconditional generative adversarial netsAtari games
《计算机与数字工程》 2024 (006)
1739-1745 / 7
贵州省科学技术基金项目(编号:黔科合基础[2020]1Y275);贵州省科技计划项目(编号:黔科合基础[2019]1130号)资助.
评论