| 注册
首页|期刊导航|电子学报|一种不稳定环境下的策略搜索及迁移方法

一种不稳定环境下的策略搜索及迁移方法

朱斐 刘全 傅启明 陈冬火 王辉 伏玉琛

电子学报2017,Vol.45Issue(2):257-266,10.
电子学报2017,Vol.45Issue(2):257-266,10.DOI:10.3969/j.issn.0372-2112.2017.02.001

一种不稳定环境下的策略搜索及迁移方法

A Policy Search and Transfer Approach in the Non-stationary Environment

朱斐 1刘全 2傅启明 3陈冬火 1王辉 3伏玉琛1

作者信息

  • 1. 苏州大学计算机科学与技术学院,江苏苏州215006
  • 2. 苏州大学江苏省计算机信息处理技术重点实验室,江苏苏州215006
  • 3. 符号计算与知识工程教育部重点实验室(吉林大学),吉林长春130012
  • 折叠

摘要

Abstract

As an online learning algorithm,reinforcement learning,which obtains the optimal policy with the maximum expected cumulative reward by interacting with the environment,is mostly based on the stationary Markov Decision Process (MDP) but however is unable to deal with problems of the non-stationary case because traditional reinforcement learning algorithms cannot be used to learn an optimal policy directly due to the failure of MDP model after the agent once interacts with the environment.Hereby,a novel policy search algorithm based on a formula set (FSPS),which is generated by features extracted from the collected historical sample trajectories,was proposed.The algorithm adopted the formula with the best performance as the optimal policy.The algorithm also took advantage of concept of transfer learning by transferred the learned policy between two similar MDP distributions,where the performance of the transferred policy mainly depends on the distance between two MDP distributions as well as the performance of the learned policy in the original MDP distribution.Simulation results on the Markov Chain problem show that the algorithm can solve the problem of the non-stationary case quite well.

关键词

强化学习/策略搜索/策略迁移/不稳定环境/公式集

Key words

reinforcement learning/policy search/policy transfer/non-stationary environment/formula set

分类

信息技术与安全科学

引用本文复制引用

朱斐,刘全,傅启明,陈冬火,王辉,伏玉琛..一种不稳定环境下的策略搜索及迁移方法[J].电子学报,2017,45(2):257-266,10.

基金项目

国家自然科学基金(No.61303108,No.61373094,No.61272005,No.61472262,No.61502329) (No.61303108,No.61373094,No.61272005,No.61472262,No.61502329)

江苏省高校自然科学研究基金(No.13KJB520020) (No.13KJB520020)

吉林大学符号计算与知识工程教育部重点实验室基金(No.93K172014K04) (No.93K172014K04)

苏州市应用基础研究计划基金(No.SYG201422) (No.SYG201422)

苏州大学高校省级重点实验室基金(No.KJS1524) (No.KJS1524)

中国国家留学基金(No.201606920013) (No.201606920013)

电子学报

OA北大核心CSCDCSTPCD

0372-2112

访问量3
|
下载量0
段落导航相关论文