首页|期刊导航|自动化学报|CTMDP基于随机平稳策略的仿真优化算法

CTMDP基于随机平稳策略的仿真优化算法

唐昊奚宏生殷保群

自动化学报2004，Vol.30Issue(2)：229-234,6.

CTMDP基于随机平稳策略的仿真优化算法

A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies

唐昊 ¹奚宏生 ²殷保群¹

作者信息

1. 中国科学技术大学自动化系,合肥,230026
2. 合肥工业大学计算机系,合肥,230009
折叠

摘要

Abstract

Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.

关键词

性能势/神经元动态规划/仿真优化

Key words

Performance potentials/neuro-dynamic programming/simulation optimization

分类

信息技术与安全科学

引用本文复制引用

唐昊,奚宏生,殷保群..CTMDP基于随机平稳策略的仿真优化算法[J].自动化学报,2004,30(2):229-234,6.

基金项目

Supported by National Natural Science Foundation of P. R. China(60274012) and the Natural Science Foundation of Anhui Province(01042308) （60274012）

自动化学报

OA北大核心CSCDCSTPCD

ISSN：0254-4156

访问量0

下载量0

段落导航