高技术通讯(英文版)2007,Vol.13Issue(2):173-176,4.
Online support vector regression for reinforcement learning
Online support vector regression for reinforcement learning
摘要
Abstract
The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online support vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of support vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable convergence speed and can solve continuous problems that are infeasible using lookup table.关键词
reinforcement learning/function approximation/support vector regression/online learningKey words
reinforcement learning/function approximation/support vector regression/online learning分类
数理科学引用本文复制引用
Yu Zhenhua,Cai Yuanli..Online support vector regression for reinforcement learning[J].高技术通讯(英文版),2007,13(2):173-176,4.基金项目
Supported by the High Technology Research and Development Programme of China (No.2003AA721070). (No.2003AA721070)