电工技术学报2024,Vol.39Issue(14):4547-4556,10.DOI:10.19595/j.cnki.1000-6753.tces.230694
基于双延迟深度确定性策略梯度的受电弓主动控制
Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient
摘要
Abstract
The stable coupling between the pantograph and the catenary is the foundation for the safe operation of high-speed railway trains.With speed increases,the offline and arcing of the pantograph and catenary can affect the performance,leading to a decrease in the current collection quality of the train.At present,the primary method to improve the current collection quality is the active control method of the pantograph.The self-adaptability of current control algorithms mainly solves adaptive selection problems of algorithm parameters.However,few studies on the impact of changes in line conditions and external disturbances exist.This paper constructs the pantograph active control system based on the deep reinforcement learning method,which can effectively overcome the complex time-varying characteristics of the pantograph catenary system to reduce fluctuations of the pantograph catenary contact force. The deep reinforcement learning algorithm is introduced.Then,a pantograph catenary coupling model is constructed as the environmental module to generate data for deep reinforcement learning training and obtain feedback on control strategies.The pantograph adopts a three-mass block model,and the contact network adopts a nonlinear pole/cable finite element method coupled through penalty functions.The pantograph active control system's objectives and the existing constraints are analyzed according to state space,observation space,action space,and reward function required in the deep reinforcement learning framework.The process of controller training and testing is provided.The effectiveness and robustness of the pantograph active control system are verified. The experimental results show that the reinforcement learning active control reduces contact force fluctuations at different speeds,and the average value of the contact force is almost unchanged.Compared with the finite frequency H∞ control,the standard deviation of the contact force is decreased by 21.8%using the double delay deep deterministic strategy gradient(TD3)control.By analyzing the span passing frequency(SPF)data of the contact pressure span,the PSD of contact pressure is reduced by nearly 80%using TD3 control.Since the energy of SPF accounts for a large proportion of the fluctuation frequency of the contact force,reducing the energy of SPF can effectively decrease overall contact force fluctuations.At the same time,TD3 control requires a lower amplitude of the control force than H∞ control,which has a smaller impact on the airbag.From the perspective of the control force output frequency,TD3 control does not adjust the high-frequency part,which is in line with the slow adjustment speed of the pneumatic mechanism of the pantograph airbag.Under different pantograph catenary conditions,TD3 control can reduce the standard deviation of the contact force more effectively than H∞ control,which indicates that TD3 algorithm has good robustness. Compared with the traditional control methods,(1)the active pantograph control algorithm based on deep reinforcement learning is an end-to-end data-driven algorithm,which does not need an accurate pantograph catenary system model.The control model is generated from readily available operating data and has strong adaptability.(2)The deep reinforcement learning algorithm constructs the relationship between the observation space and the action space to the reward function through exploration and trial and error.Therefore,environmental changes cause changes in the observation value,and the controller can quickly adjust the corresponding action to maximize the reward function.(3)Under the constraints of external conditions,such as pantograph actuators and pantograph observers,different control strategies can be achieved by adjusting the observation space and reward function.关键词
低速线路/混跑/双延迟深度确定性策略梯度(TD3)/受电弓主动控制Key words
Low speed network/mixed running/TD3/active pantograph control分类
信息技术与安全科学引用本文复制引用
吴延波,韩志伟,王惠,刘志刚,张雨婧..基于双延迟深度确定性策略梯度的受电弓主动控制[J].电工技术学报,2024,39(14):4547-4556,10.基金项目
国家自然科学基金资助项目(U1734202,51977182). (U1734202,51977182)