计算机与数字工程2019,Vol.47Issue(7):1582-1585,4.DOI:10.3969/j.issn.1672-9722.2019.07.005
多步强化学习算法的收敛性分析
Convergence Analysis of Multistep Reinforcement Learning Algorithm
摘要
Abstract
Recently,a new algorithm called Q(σ) has been presented to evalued value function in the theory of reinforcement learning algorithm,where σ is the degree of sampling. Q(σ) is a new method between full-sampling and no-sampling and it unifies Sarsa and Expected Sarsa. However,the original paper only tests the performance of Q(σ) on experiments. This paper gives a theo?retical analysis of Q(σ) . It gives a proof that under some conditions,Q(σ) can converge to the value functions.关键词
强化学习/值函数估计/优化/时间差分Key words
reinforcement learning/value function estimate/optimization/temporal difference分类
信息技术与安全科学引用本文复制引用
杨瑞..多步强化学习算法的收敛性分析[J].计算机与数字工程,2019,47(7):1582-1585,4.