首页|期刊导航|太原理工大学学报|有样本重用的阶段性策略梯度深度强化学习

有样本重用的阶段性策略梯度深度强化学习

李海亮王莉

太原理工大学学报2024，Vol.55Issue(4)：712-719,8.

太原理工大学学报2024，Vol.55Issue(4)：712-719,8.DOI:10.16355/j.tyut.1007-9432.20230300

有样本重用的阶段性策略梯度深度强化学习

Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

李海亮 ¹王莉¹

作者信息

1. 太原理工大学大数据学院,山西晋中 030600
折叠

摘要

Abstract

[Purposes]The algoritihm of phasic policy gradient with sample reuse(SR-PPG)is proposed to address the problems of non-reuse of samples and low sample utilization in policy-based deep reinforcement learning algorithms.[Methods]In the proposed algorithm,offline data are introduced on the basis of the phasic policy gradient(PPG),thus reducing the time cost of training and enabling the model to converge quickly.In this work,SR-PPG combines the stabili-ty advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG.[Findings]A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effec-tively balancing the competing goals of stability and sample efficiency.

关键词

深度强化学习/阶段性策略梯度/样本重用

Key words

deep reinforcement learning/phasic policy gradient/sample reuse

分类

信息技术与安全科学

引用本文复制引用

李海亮,王莉..有样本重用的阶段性策略梯度深度强化学习[J].太原理工大学学报,2024,55(4):712-719,8.

基金项目

国家自然科学基金区域创新发展联合基金资助项目(U22A20167) （U22A20167）

国家重点研发计划(2021YFB3300503) （2021YFB3300503）

太原理工大学学报

OA北大核心CSTPCD

ISSN：1007-9432

访问量0

下载量0

段落导航