| 注册
首页|期刊导航|电子学报|一种新的基于值函数迁移的快速Sarsa算法

一种新的基于值函数迁移的快速Sarsa算法

傅启明 刘全 尤树华 黄蔚 章晓芳

电子学报Issue(11):2157-2161,5.
电子学报Issue(11):2157-2161,5.DOI:10.3969/j.issn.0372-2112.2014.11.005

一种新的基于值函数迁移的快速Sarsa算法

A Novel Fast Sarsa Algorithm Based on Value Function Transfer

傅启明 1刘全 1尤树华 2黄蔚 1章晓芳1

作者信息

  • 1. 苏州大学计算机科学与技术学院,江苏苏州 215006
  • 2. 吉林大学符号计算与知识工程教育部重点实验室,吉林长春 130012
  • 折叠

摘要

Abstract

Knowledge Transfer has gradually became a research hot pot in machine learning ,which tries to transfer the knowledge from the historical tasks to the target task in order to speed up the convergence rate and improve the performance of al-gorithms .With respect to the slow convergence rate of traditional reinforcement learning algorithms ,this paper proposed to transfer the value function between different similar learning tasks with the same state space and action space ,which tries to reduce the need-ed samples in the target task and speed up the convergence rate .Based on the framework of on-policy Sarsa algorithm ,combined with the value function transfer method ,this paper put forward a novel fast Sarsa algorithm based on the value function transfer—VFT-Sarsa .At the beginning ,the algorithm uses Bisimulation metric to measure the distance between states in target task and histor-ical task on the condition that these tasks have the same state space and action space ,transfers the value function if the distance meets some condition ,and finally executes the learning algorithm .At the end ,apply the proposed algorithm in Random Walk ,com-pared with Sarsa algorithm ,Q-Learning and QV algorithm ,the results show that the proposed algorithm can get a better convergence rate with a good performance .

关键词

强化学习/VFT-Sarsa算法/自模拟度量/值函数迁移

Key words

reinforcement learning/VFT-Sarsa algorithm/bisimulation metric/value function transfer

分类

信息技术与安全科学

引用本文复制引用

傅启明,刘全,尤树华,黄蔚,章晓芳..一种新的基于值函数迁移的快速Sarsa算法[J].电子学报,2014,(11):2157-2161,5.

基金项目

国家自然科学基金(No .61103045,No .61303108);江苏省自然科学基金(No .BK2012616);江苏省高校自然科学研究项目(No .13KJB520020);吉林大学符号计算与知识工程教育部重点实验室资助 ()

电子学报

OA北大核心CSCDCSTPCD

0372-2112

访问量0
|
下载量0
段落导航相关论文