计算机工程与应用2019,Vol.55Issue(22):119-126,8.DOI:10.3778/j.issn.1002-8331.1904-0238
基于生成对抗网络的最大熵逆强化学习
Maximum Entropy Inverse Reinforcement Learning Based on Generative Adversarial Networks
摘要
Abstract
Aiming at the problem that the inverse reinforcement learning algorithm is slow in learning rate due to the sparseness of expert samples in the early stage of training, a maximum entropy inverse reinforcement learning algorithm based on Generative Adversarial Networks(GAN)is proposed. In the learning process, the expert samples are used to train and optimize the GAN to generate the virtual expert samples. Based on this, the non-expert samples are generated by using the stochastic policy and the mixed sample set is constructed. The maximum entropy probability model is combined to model the reward function, and the gradient descent method is used to solve the optimal reward function. Based on the optimal reward function, the forward reinforcement learning method is used to solve the optimal policy. On this basis, non-expert samples are further generated, the mixed sample set is reconstructed, and the optimal reward function is solved iter-atively. The proposed algorithm and MaxEnt IRL algorithm are applied to the classic Object World and Mountain Car problems. Experiments show that the algorithm can solve the reward function better when the expert samples are sparse, and has better convergence performance.关键词
生成对抗网络(GAN)/逆强化学习/最大熵Key words
Generative Adversarial Networks(GAN)/inverse reinforcement learning/maximum entropy分类
信息技术与安全科学引用本文复制引用
陈建平,陈其强,傅启明,高振,吴宏杰,陆悠..基于生成对抗网络的最大熵逆强化学习[J].计算机工程与应用,2019,55(22):119-126,8.基金项目
国家自然科学基金(No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334) (No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334)
江苏省重点研发计划项目(No.BE2017663). (No.BE2017663)