| 注册
首页|期刊导航|通信学报|基于优先级扫描Dyna结构的贝叶斯Q学习方法

基于优先级扫描Dyna结构的贝叶斯Q学习方法

于俊 刘全 傅启明 孙洪坤 陈桂兴

通信学报Issue(11):129-139,11.
通信学报Issue(11):129-139,11.DOI:10.3969/j.issn.1000-436x.2013.11.015

基于优先级扫描Dyna结构的贝叶斯Q学习方法

Bayesian Q learning method with Dyna architecture and prioritized sweeping

于俊 1刘全 1傅启明 2孙洪坤 1陈桂兴1

作者信息

  • 1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
  • 2. 吉林大学 符号计算与知识工程教育部重点实验室,吉林 长春 130012
  • 折叠

摘要

Abstract

In order to balance this trade-off, a probability distribution was used in Bayesian Q learning method to de-scribe the uncertainty of the Q value and choose actions with this distribution. But the slow convergence is a big problem for Bayesian Q-Learning. In allusion to the above problems, a novel Bayesian Q learning algorithm with Dyna architec-ture and prioritized sweeping, called Dyna-PS-BayesQL was proposed. The algorithm mainly includes two parts:in the learning part, it models the transition function and reward function according to collected samples, and update Q value function by Bayesian Q-learning, in the programming part, it updates the Q value function by using prioritized sweeping and dynamic programming methods based on the constructed model, which can improve the efficiency of using the his-torical information. Applying the Dyna-PS-BayesQL to the chain problem and maze navigation problem, the results show that the proposed algorithm can get a good performance of balancing the exploration and exploitation in the learning process, and get a better convergence performance.

关键词

强化学习/马尔科夫决策过程/优先级扫描/Dyna结构/贝叶斯Q学习

Key words

reinforcement learning/Markov decision process/prioritized sweeping/Dyna architecture/Bayesian Q learning

分类

信息技术与安全科学

引用本文复制引用

于俊,刘全,傅启明,孙洪坤,陈桂兴..基于优先级扫描Dyna结构的贝叶斯Q学习方法[J].通信学报,2013,(11):129-139,11.

基金项目

国家自然科学基金资助项目(61070223,61103045,61070122,61272005);江苏省自然科学基金资助项目(BK2012616);江苏省高校自然科学研究基金资助项目(09KJA520002,09KJB520012);吉林大学符号计算与知识工程教育部重点实验室基金资助项目(93K172012K04)@@@@ The National Natural Science Foundation of China(61070223,61103045,61070122,61272005) (61070223,61103045,61070122,61272005)

The Natural Science Foundation of Jiangsu Province(BK2012616) (BK2012616)

The High School Natural Foundation of Jiangsu Province(09KJA520002,09KJB520012) (09KJA520002,09KJB520012)

The Foundation of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University(93K172012K04) (93K172012K04)

通信学报

OA北大核心CSCDCSTPCD

1000-436X

访问量0
|
下载量0
段落导航相关论文