自动化学报2011,Vol.37Issue(1):44-51,8.DOI:10.3724/SP.J.1004.2011.00044
基于状态-动作图测地高斯基的策略迭代强化学习
Policy Iteration Reinforcement Learning Based on Geodesic Gaussian Basis Defined on State-action Graph
摘要
Abstract
For policy iteration reinforcement learning methods, the construction of basis functions is an important factor of influencing the accuracy of action-value function approximation. In order to construct appropriate basis functions for the action-value function approximation, a policy iteration reinforcement learning method based on geodesic Gaussian basis defined on state-action graph is proposed. At first, a state-action graph for a Markov decision process is constructed according to an off-policy method. Secondly, geodesic Gaussian kernel functions are defined on the state-action graph and a kernel sparsification approach based on approximate linear dependency is used to automatically select centers of the geodesic Gaussian kernels. At last, the geodesic Gaussian kernels based on the state-action graph is used to approximate the action-value function during the process of policy evaluation, and then the policy is improved based on the estimated action-value function. Simulation results concerning a 10 × 10 grid-world illustrate that the proposed method can accurately approximate the action-value function having smoothness and discontinuity properties with less basis functions as compared with the policy iteration reinforcement learning methods based on either ordinary Gaussian basis or geodesic Gaussian basis defined on a state graph, which is helpful for obtaining an optimal policy effectively.关键词
状态-动作图/测地高斯核/基函数/策略迭代/强化学习引用本文复制引用
程玉虎,冯涣婷,王雪松..基于状态-动作图测地高斯基的策略迭代强化学习[J].自动化学报,2011,37(1):44-51,8.基金项目
国家自然科学基金(60804022,60974050,61072094),教育部新世纪优秀人才支持计划(NCET-08-0836),霍英东教育基金会青年教师基金(121066),江苏省自然科学基金(BK2008126)资助 (60804022,60974050,61072094)