电子学报2016,Vol.44Issue(11):2752-2757,6.DOI:10.3969/j.issn.0372-2112.2016.11.026
一种基于随机投影的贝叶斯时间差分算法
A Bayesian Te mporal Difference Algorith m Based on Rando m Projection
摘要
Abstract
Most algorithms are based on policy evaluation in reinforcement learning.The Gaussian process temporal difference is an algorithm that uses Bayesian solution to evaluate value functions.In the method,Gaussian process builds a probabilistic generative model between the immediate reward and the value function through Bellman Equation and Bayesian rule.In order to improve the efficiency of the algorithm,approximate linear approximation for new samples is solved by on-line kernel sparse and least squares in state space.However,the time complexity is still high.To deal with this problem,a Bayesian temporal difference algorithm bases on random projection algorithm is proposed.The elements in dictionary state set are mapped to hash values by hash function.According to the hash values,groups are divided and the comparison be-tween the states is reduced.The experimental results show that this algorithm not only improves the execution speed,but also obtains balance between execution time and precision of the state value function.关键词
强化学习/马尔科夫决策过程/高斯过程/随机投影/时间差分算法Key words
reinforcement learning/markov decision process/gaussian process/random projection/temporal differ-ence learning分类
信息技术与安全科学引用本文复制引用
刘全,于俊,王辉,傅启明,朱斐..一种基于随机投影的贝叶斯时间差分算法[J].电子学报,2016,44(11):2752-2757,6.基金项目
国家自然科学基金(No.61272005,No.61303108,No.61373094,No.61472262,No.61502323,No.61502329);江苏省自然科学基金(No. BK2012616);江苏省高校自然科学研究项目(No.13KJB520020);吉林大学符号计算与知识工程教育部重点实验室项目(No.93K172014K04);苏州市应用基础研究计划工业部分 ()