自动化学报2004,Vol.30Issue(2):229-234,6.
CTMDP基于随机平稳策略的仿真优化算法
A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies
摘要
Abstract
Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.关键词
性能势/神经元动态规划/仿真优化Key words
Performance potentials/neuro-dynamic programming/simulation optimization分类
信息技术与安全科学引用本文复制引用
唐昊,奚宏生,殷保群..CTMDP基于随机平稳策略的仿真优化算法[J].自动化学报,2004,30(2):229-234,6.基金项目
Supported by National Natural Science Foundation of P. R. China(60274012) and the Natural Science Foundation of Anhui Province(01042308) (60274012)