自动化学报2012,Vol.38Issue(11):1765-1776,12.DOI:10.3724/SP.J.1004.2012.01765
一类基于谱方法的强化学习混合迁移算法
A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method
摘要
Abstract
For scaling up state space transfer underlying the proto-value function framework, only some basis functions corresponding to smaller eigenvalues are transferred effectively, which will result in wrong approximation of value function in the target task. In order to solve the problem, according to the fact that Laplacian eigenmap can preserve the local topology structure of state space, an improved hierarchical decomposition algorithm based on the spectral graph theory is proposed and a hybrid transfer method integrating basis function transfer with subtask optimal polices transfer is designed. At first, the basis functions of the source task are constructed using spectral method. The basis functions of target task are produced through linearly interpolating basis functions of the source task. Secondly, the produced second basis function of the target task (approximating Fiedler eigenvector) is used to decompose the target task. Then the optimal polices of subtasks are obtained using the improved hierarchical decomposition algorithm. At last, the obtained basis functions and optimal subtask polices are transferred to the target task. The proposed hybrid transfer method can directly get optimal policies of some states, reduce the number of iterations and the minimum number of basis functions needed to approximate the value function. The method is suitable for scaling up state space transfer task with hierarchical control structure. Simulation results of grid world have verified the validity of the proposed hybrid transfer method.关键词
强化学习/迁移学习/谱图理论/原型值函数/层次分解Key words
Reinforcement learning/ transfer learning/ spectral graph theory/ proto-value functions/ hierarchical decomposition引用本文复制引用
朱美强,程玉虎,李明,王雪松,冯涣婷..一类基于谱方法的强化学习混合迁移算法[J].自动化学报,2012,38(11):1765-1776,12.基金项目
国家自然科学基金(60974050,61072094,61273143),中国矿业大学青年科技基金(OC080252),教育部新世纪优秀人才支持计划(NCET-08-0836,NCET-10-0765),教育部高等学校博士学科点专项科研基金(20110095110016)资助 (60974050,61072094,61273143)