首页|期刊导航|电子学报|一种基于随机投影的贝叶斯时间差分算法

一种基于随机投影的贝叶斯时间差分算法

刘全于俊王辉傅启明朱斐

电子学报2016，Vol.44Issue(11)：2752-2757,6.

电子学报2016，Vol.44Issue(11)：2752-2757,6.DOI:10.3969/j.issn.0372-2112.2016.11.026

一种基于随机投影的贝叶斯时间差分算法

A Bayesian Te mporal Difference Algorith m Based on Rando m Projection

刘全 ¹于俊 ²王辉 ³傅启明 ¹朱斐³

作者信息

1. 苏州大学计算机科学与技术学院，江苏苏州215006
2. 吉林大学符号计算与知识工程教育部重点实验室，吉林长春 130012
3. 软件新技术与产业化协同创新中心，江苏南京210023
折叠

摘要

Abstract

Most algorithms are based on policy evaluation in reinforcement learning.The Gaussian process temporal difference is an algorithm that uses Bayesian solution to evaluate value functions.In the method,Gaussian process builds a probabilistic generative model between the immediate reward and the value function through Bellman Equation and Bayesian rule.In order to improve the efficiency of the algorithm,approximate linear approximation for new samples is solved by on-line kernel sparse and least squares in state space.However,the time complexity is still high.To deal with this problem,a Bayesian temporal difference algorithm bases on random projection algorithm is proposed.The elements in dictionary state set are mapped to hash values by hash function.According to the hash values,groups are divided and the comparison be-tween the states is reduced.The experimental results show that this algorithm not only improves the execution speed,but also obtains balance between execution time and precision of the state value function.

关键词

强化学习/马尔科夫决策过程/高斯过程/随机投影/时间差分算法

Key words

reinforcement learning/markov decision process/gaussian process/random projection/temporal differ-ence learning

分类

信息技术与安全科学

引用本文复制引用

刘全,于俊,王辉,傅启明,朱斐..一种基于随机投影的贝叶斯时间差分算法[J].电子学报,2016,44(11):2752-2757,6.

基金项目

国家自然科学基金（No．61272005，No．61303108，No．61373094，No．61472262，No．61502323，No．61502329）；江苏省自然科学基金（No． BK2012616）；江苏省高校自然科学研究项目（No．13KJB520020）；吉林大学符号计算与知识工程教育部重点实验室项目（No．93K172014K04）；苏州市应用基础研究计划工业部分（）

电子学报

OA北大核心CSCDCSTPCD

ISSN：0372-2112

访问量3

下载量0

段落导航