摘要
Abstract
Proton exchange membrane fuel cells(PEMFCs)have the characteristics of difficulty to model accurately and strong nonlinearity;in addition,the radiator and circulating water pump in the hydrothermal management system of the fuel cell system have the characteristics of strong coupling,which makes it difficult for the model-based control algorithms to achieve accurate control of the fuel cell temperature,this paper proposes a data-driven model-free algorithm based on the on classified replay twin delayed Bayesian deep deterministic policy gradient(CTDB-DDPG)to achieve the control of the fuel cell temperature system.
Firstly,the use of deep deterministic policy gradient is proposed to solve the problem of intricate modeling of fuel cells.Then,the classification experience playback strategy is added to the algorithm,and the CTDB-DDPG algorithm uses two experience buffer pools to store the experience data.When constructing the network model,the average TD error of all samples in these two experience buffer pools is initialized to 0.Whenever new experience data is generated,the average TD errors of all experience data are first updated.If its TD error exceeds the mean value,it is stored in the empirical buffer pool I.Otherwise,it is stored in the empirical buffer pool Ⅱ.Classifying each experience sample's TD error helps better use the empirical data to train the network model.CTDB-DDPG considers the neural network's uncertainty by incorporating a Bayesian neural network into the algorithm,and the proposed Bootstrap with random initialization leads to a reasonable uncertainty estimation.At the beginning of each round or fixed interval during the learning process,unbiased hypotheses are obtained from the posterior distributions of the MDP parameters and estimated using a multi-head shared network Bootstrap value function,which does not require additional computational resources.
Moreover,using Q-learning preserves the uncertainty of the cumulative discount,which is more effective for environments requiring deep exploration.Randomly selecting the head network and simulating Thompson sampling can effectively avoid ineffective boosting of intelligence in the noise strategy,accelerating the convergence of the CTDB-DDPG algorithm.In addition,the fuel cell thermal management system has a large inertia;the algorithm in this paper adds OU noise to the action to improve the exploration efficiency.OU noise is a temporary correlation noise extracted from the Ornstein-Uhlenbeck process,which helps the algorithm to better explore different strategies by generating temporal correlation noise.This exploration process can help the algorithm to find possible better strategies,thus improving the performance and efficiency of the algorithm.Although the addition of noise can cause the algorithm's performance to deteriorate in the short term,in the long term,the addition of noise can help the algorithm to avoid falling into a local optimum.It may help to find a better strategy.
Finally,the algorithm's validity is verified on the simulation platform Simulink as well as the experimental platform RT-Lab,and similar conclusions are obtained,verifying the algorithm's effectiveness.However,although our CTDB-DDPG temperature control strategy has been validated on simulation and hardware-in-the-loop test platforms,more complex real-world working conditions,such as ambient temperature and humidity variations and equipment aging,will be considered in future studies to test and improve the adaptability and robustness of our algorithm in the broader range of more complex situations.关键词
燃料电池/联合控制/深度确定性/贝叶斯网络Key words
Fuel cell/joint control/deep reinforcement learning/Bayesian network分类
信息技术与安全科学