首页|期刊导航|自动化学报|SMDP基于性能势的神经元动态规划

SMDP基于性能势的神经元动态规划

唐昊袁继彬陆阳程文娟

自动化学报2005，Vol.31Issue(4)：642-645,4.

SMDP基于性能势的神经元动态规划

Performance Potential-based Neuro-dynamic Programming for SMDPs

唐昊 ¹袁继彬 ¹陆阳 ¹程文娟¹

作者信息

1. School of Computer and Information, Hefei University of Technology, Hefei 230009
折叠

摘要

Abstract

An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.

关键词

Semi-Markov decision processes/performance potentials/neuro-dynamic program-ming

Key words

Semi-Markov decision processes/performance potentials/neuro-dynamic program-ming

分类

信息技术与安全科学

引用本文复制引用

唐昊,袁继彬,陆阳,程文娟..SMDP基于性能势的神经元动态规划[J].自动化学报,2005,31(4):642-645,4.

基金项目

Supported by National Natural Science Foundation of P. R. China (60404009, 60175011), the Natural Science Foundation of Anhui Province (050420303), and the Sustentation Project of Hefei University of Technology for the Science and Technology-innovation Grorps （60404009, 60175011）

自动化学报

OA北大核心CSCD

ISSN：0254-4156

访问量0

下载量0

段落导航