控制理论与应用2021,Vol.38Issue(11):1707-1716,10.DOI:10.7641/CTA.2021.10763
可数状态空间的平均成本马氏决策过程
Average cost Markov decision processes with countable state spaces
摘要
Abstract
For the long-run average of a Markov decision process (MDP) with countable state spaces, the optimal (sta-tionary) policy may not exist. In this paper, we study the optimal policies satisfying optimality inequality in a countable-state MDP under the long-run average criterion. Different from the vanishing discount approach, we use the discrete Dynkin's formula to derive the main results of this paper. We first provide the Poisson equation of an ergodic Markov chain and two instructive examples about null recurrent Markov chains, and demonstrate the existence of optimal policies for two optimal-ity inequalities with opposite directions. Then, from two comparison lemmas and the performance difference formula, we prove the existence of optimal policies under positive recurrent chains and multi-chains, which is further extended to other situations. Especially, several examples of applications are provided to illustrate the essential of performance sensitivity of the long-run average. Our results make a supplement to the literature work on the optimality inequality of average MDPs with countable states.关键词
马尔可夫决策过程/平均准则/可数状态空间/Dynkin公式/泊松方程/性能敏感Key words
Markov decision process/long-run average/countable state spaces/Dynkin's form ula/Poisson equation/performance sensitivity引用本文复制引用
张俊玉,吴怡婷,夏俐,曹希仁..可数状态空间的平均成本马氏决策过程[J].控制理论与应用,2021,38(11):1707-1716,10.基金项目
Supported by the National Natural Science Foundation of China(61673019,61773411,11931018,62073346),the Guangdong Province Key Labora-tory of Computational Science at the Sun Yat-sen University(2020B1212060032)and the Guangdong Basic and Applied Basic Research Foundation(2021A1515010057,2021A1515011984). (61673019,61773411,11931018,62073346)