| 注册
首页|期刊导航|自动化学报|基于状态-动作图测地高斯基的策略迭代强化学习

基于状态-动作图测地高斯基的策略迭代强化学习

程玉虎 冯涣婷 王雪松

自动化学报2011,Vol.37Issue(1):44-51,8.
自动化学报2011,Vol.37Issue(1):44-51,8.DOI:10.3724/SP.J.1004.2011.00044

基于状态-动作图测地高斯基的策略迭代强化学习

Policy Iteration Reinforcement Learning Based on Geodesic Gaussian Basis Defined on State-action Graph

程玉虎 1冯涣婷 1王雪松1

作者信息

  • 1. 中国矿业大学信息与电气工程学院,徐州,221116
  • 折叠

摘要

Abstract

For policy iteration reinforcement learning methods, the construction of basis functions is an important factor of influencing the accuracy of action-value function approximation. In order to construct appropriate basis functions for the action-value function approximation, a policy iteration reinforcement learning method based on geodesic Gaussian basis defined on state-action graph is proposed. At first, a state-action graph for a Markov decision process is constructed according to an off-policy method. Secondly, geodesic Gaussian kernel functions are defined on the state-action graph and a kernel sparsification approach based on approximate linear dependency is used to automatically select centers of the geodesic Gaussian kernels. At last, the geodesic Gaussian kernels based on the state-action graph is used to approximate the action-value function during the process of policy evaluation, and then the policy is improved based on the estimated action-value function. Simulation results concerning a 10 × 10 grid-world illustrate that the proposed method can accurately approximate the action-value function having smoothness and discontinuity properties with less basis functions as compared with the policy iteration reinforcement learning methods based on either ordinary Gaussian basis or geodesic Gaussian basis defined on a state graph, which is helpful for obtaining an optimal policy effectively.

关键词

状态-动作图/测地高斯核/基函数/策略迭代/强化学习

引用本文复制引用

程玉虎,冯涣婷,王雪松..基于状态-动作图测地高斯基的策略迭代强化学习[J].自动化学报,2011,37(1):44-51,8.

基金项目

国家自然科学基金(60804022,60974050,61072094),教育部新世纪优秀人才支持计划(NCET-08-0836),霍英东教育基金会青年教师基金(121066),江苏省自然科学基金(BK2008126)资助 (60804022,60974050,61072094)

自动化学报

OA北大核心CSCDCSTPCD

0254-4156

访问量0
|
下载量0
段落导航相关论文