| 注册
首页|期刊导航|计算机工程与应用|基于生成对抗网络的最大熵逆强化学习

基于生成对抗网络的最大熵逆强化学习

陈建平 陈其强 傅启明 高振 吴宏杰 陆悠

计算机工程与应用2019,Vol.55Issue(22):119-126,8.
计算机工程与应用2019,Vol.55Issue(22):119-126,8.DOI:10.3778/j.issn.1002-8331.1904-0238

基于生成对抗网络的最大熵逆强化学习

Maximum Entropy Inverse Reinforcement Learning Based on Generative Adversarial Networks

陈建平 1陈其强 2傅启明 1高振 2吴宏杰 1陆悠2

作者信息

  • 1. 苏州科技大学 电子与信息工程学院,江苏 苏州 215009
  • 2. 苏州科技大学 江苏省建筑智慧节能重点实验室,江苏 苏州 215009
  • 折叠

摘要

Abstract

Aiming at the problem that the inverse reinforcement learning algorithm is slow in learning rate due to the sparseness of expert samples in the early stage of training, a maximum entropy inverse reinforcement learning algorithm based on Generative Adversarial Networks(GAN)is proposed. In the learning process, the expert samples are used to train and optimize the GAN to generate the virtual expert samples. Based on this, the non-expert samples are generated by using the stochastic policy and the mixed sample set is constructed. The maximum entropy probability model is combined to model the reward function, and the gradient descent method is used to solve the optimal reward function. Based on the optimal reward function, the forward reinforcement learning method is used to solve the optimal policy. On this basis, non-expert samples are further generated, the mixed sample set is reconstructed, and the optimal reward function is solved iter-atively. The proposed algorithm and MaxEnt IRL algorithm are applied to the classic Object World and Mountain Car problems. Experiments show that the algorithm can solve the reward function better when the expert samples are sparse, and has better convergence performance.

关键词

生成对抗网络(GAN)/逆强化学习/最大熵

Key words

Generative Adversarial Networks(GAN)/inverse reinforcement learning/maximum entropy

分类

信息技术与安全科学

引用本文复制引用

陈建平,陈其强,傅启明,高振,吴宏杰,陆悠..基于生成对抗网络的最大熵逆强化学习[J].计算机工程与应用,2019,55(22):119-126,8.

基金项目

国家自然科学基金(No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334) (No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334)

江苏省重点研发计划项目(No.BE2017663). (No.BE2017663)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文