自动化学报2005,Vol.31Issue(4):642-645,4.
SMDP基于性能势的神经元动态规划
Performance Potential-based Neuro-dynamic Programming for SMDPs
摘要
Abstract
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.关键词
Semi-Markov decision processes/performance potentials/neuro-dynamic program-mingKey words
Semi-Markov decision processes/performance potentials/neuro-dynamic program-ming分类
信息技术与安全科学引用本文复制引用
唐昊,袁继彬,陆阳,程文娟..SMDP基于性能势的神经元动态规划[J].自动化学报,2005,31(4):642-645,4.基金项目
Supported by National Natural Science Foundation of P. R. China (60404009, 60175011), the Natural Science Foundation of Anhui Province (050420303), and the Sustentation Project of Hefei University of Technology for the Science and Technology-innovation Grorps (60404009, 60175011)