| 注册
首页|期刊导航|太原理工大学学报|有样本重用的阶段性策略梯度深度强化学习

有样本重用的阶段性策略梯度深度强化学习

李海亮 王莉

太原理工大学学报2024,Vol.55Issue(4):712-719,8.
太原理工大学学报2024,Vol.55Issue(4):712-719,8.DOI:10.16355/j.tyut.1007-9432.20230300

有样本重用的阶段性策略梯度深度强化学习

Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

李海亮 1王莉1

作者信息

  • 1. 太原理工大学大数据学院,山西晋中 030600
  • 折叠

摘要

Abstract

[Purposes]The algoritihm of phasic policy gradient with sample reuse(SR-PPG)is proposed to address the problems of non-reuse of samples and low sample utilization in policy-based deep reinforcement learning algorithms.[Methods]In the proposed algorithm,offline data are introduced on the basis of the phasic policy gradient(PPG),thus reducing the time cost of training and enabling the model to converge quickly.In this work,SR-PPG combines the stabili-ty advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG.[Findings]A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effec-tively balancing the competing goals of stability and sample efficiency.

关键词

深度强化学习/阶段性策略梯度/样本重用

Key words

deep reinforcement learning/phasic policy gradient/sample reuse

分类

信息技术与安全科学

引用本文复制引用

李海亮,王莉..有样本重用的阶段性策略梯度深度强化学习[J].太原理工大学学报,2024,55(4):712-719,8.

基金项目

国家自然科学基金区域创新发展联合基金资助项目(U22A20167) (U22A20167)

国家重点研发计划(2021YFB3300503) (2021YFB3300503)

太原理工大学学报

OA北大核心CSTPCD

1007-9432

访问量0
|
下载量0
段落导航相关论文