无线电工程2026,Vol.56Issue(1):166-176,11.DOI:10.3969/j.issn.1003-3106.2026.01.018
基于Off-policy Q-学习的时延系统线性二次型跟踪控制算法
Linear Quadratic Tracking Control Algorithm for Time-delay Systems Based on Off-policy Q-learning
摘要
Abstract
A data-driven algorithm is proposed to solve the Linear Quadratic Tracking(LQT)control problem for linear discrete-time systems with unknown model parameters,which also addresses the issue of control input time delays commonly encountered in industrial processes.Through the characterization of control problems in time-delay systems,a model-driven reinforcement learning framework is constructed,based on which a Smith predictor is introduced to avoid using the mathematical model parameters and state data,and a linear quadratic tracking control algorithm for time-delay systems is proposed based on On-policy Q-learning.Considering the impact of exploration noise on the learning results in the On-policy Q-learning algorithm,an Off-policy algorithm is further adopted to solve the linear quadratic tracking control problem for time-delay systems.On this basis,the Bellman equation used in the Q-learning algorithm is improved,and a data-driven Off-policy Q-learning algorithm is presented,which remains unaffected by exploration noise and provides unbiased solutions.Theoretical analysis and simulation experiments demonstrate that tracking control for time-delay systems is effectively achieved without reliance on system mathematical model parameters or state data.关键词
时延系统/强化学习/Off-policy/数据驱动/输出反馈Key words
time-delay systems/reinforcement learning/Off-policy/data-driven/output feedback分类
信息技术与安全科学引用本文复制引用
刘文,蔚保国,郝菁,王卿..基于Off-policy Q-学习的时延系统线性二次型跟踪控制算法[J].无线电工程,2026,56(1):166-176,11.基金项目
河北省创新能力提升计划(24460801D)Innovation Capability Promotion Plan of Hebei Province(24460801D) (24460801D)