太原理工大学学报2024,Vol.55Issue(4):712-719,8.DOI:10.16355/j.tyut.1007-9432.20230300
有样本重用的阶段性策略梯度深度强化学习
Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse
摘要
Abstract
[Purposes]The algoritihm of phasic policy gradient with sample reuse(SR-PPG)is proposed to address the problems of non-reuse of samples and low sample utilization in policy-based deep reinforcement learning algorithms.[Methods]In the proposed algorithm,offline data are introduced on the basis of the phasic policy gradient(PPG),thus reducing the time cost of training and enabling the model to converge quickly.In this work,SR-PPG combines the stabili-ty advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG.[Findings]A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effec-tively balancing the competing goals of stability and sample efficiency.关键词
深度强化学习/阶段性策略梯度/样本重用Key words
deep reinforcement learning/phasic policy gradient/sample reuse分类
信息技术与安全科学引用本文复制引用
李海亮,王莉..有样本重用的阶段性策略梯度深度强化学习[J].太原理工大学学报,2024,55(4):712-719,8.基金项目
国家自然科学基金区域创新发展联合基金资助项目(U22A20167) (U22A20167)
国家重点研发计划(2021YFB3300503) (2021YFB3300503)