| 注册
首页|期刊导航|控制理论与应用|可数状态空间的平均成本马氏决策过程

可数状态空间的平均成本马氏决策过程

张俊玉 吴怡婷 夏俐 曹希仁

控制理论与应用2021,Vol.38Issue(11):1707-1716,10.
控制理论与应用2021,Vol.38Issue(11):1707-1716,10.DOI:10.7641/CTA.2021.10763

可数状态空间的平均成本马氏决策过程

Average cost Markov decision processes with countable state spaces

张俊玉 1吴怡婷 1夏俐 2曹希仁3

作者信息

  • 1. 中山大学数学学院,广东广州510275
  • 2. 中山大学管理学院,广东广州510275
  • 3. 香港科技大学电子与计算机工程系,中国香港
  • 折叠

摘要

Abstract

For the long-run average of a Markov decision process (MDP) with countable state spaces, the optimal (sta-tionary) policy may not exist. In this paper, we study the optimal policies satisfying optimality inequality in a countable-state MDP under the long-run average criterion. Different from the vanishing discount approach, we use the discrete Dynkin's formula to derive the main results of this paper. We first provide the Poisson equation of an ergodic Markov chain and two instructive examples about null recurrent Markov chains, and demonstrate the existence of optimal policies for two optimal-ity inequalities with opposite directions. Then, from two comparison lemmas and the performance difference formula, we prove the existence of optimal policies under positive recurrent chains and multi-chains, which is further extended to other situations. Especially, several examples of applications are provided to illustrate the essential of performance sensitivity of the long-run average. Our results make a supplement to the literature work on the optimality inequality of average MDPs with countable states.

关键词

马尔可夫决策过程/平均准则/可数状态空间/Dynkin公式/泊松方程/性能敏感

Key words

Markov decision process/long-run average/countable state spaces/Dynkin's form ula/Poisson equation/performance sensitivity

引用本文复制引用

张俊玉,吴怡婷,夏俐,曹希仁..可数状态空间的平均成本马氏决策过程[J].控制理论与应用,2021,38(11):1707-1716,10.

基金项目

Supported by the National Natural Science Foundation of China(61673019,61773411,11931018,62073346),the Guangdong Province Key Labora-tory of Computational Science at the Sun Yat-sen University(2020B1212060032)and the Guangdong Basic and Applied Basic Research Foundation(2021A1515010057,2021A1515011984). (61673019,61773411,11931018,62073346)

控制理论与应用

OA北大核心CSCDCSTPCD

1000-8152

访问量0
|
下载量0
段落导航相关论文