首页|期刊导航|电工技术学报|基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

赵洪山潘思潮马利波吴雨晨吕廷彦

电工技术学报2024，Vol.39Issue(13)：4240-4256,17.

电工技术学报2024，Vol.39Issue(13)：4240-4256,17.DOI:10.19595/j.cnki.1000-6753.tces.230699

基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

Control of Fuel Cell Temperature Based on Classified Replay Twin Delayed Bayesian Deep Deterministic Policy Gradient

赵洪山 ¹潘思潮 ¹马利波 ¹吴雨晨 ¹吕廷彦¹

作者信息

1. 河北省分布式储能与微网重点实验室(华北电力大学) 保定 071003
折叠

摘要

Abstract

Proton exchange membrane fuel cells(PEMFCs)have the characteristics of difficulty to model accurately and strong nonlinearity;in addition,the radiator and circulating water pump in the hydrothermal management system of the fuel cell system have the characteristics of strong coupling,which makes it difficult for the model-based control algorithms to achieve accurate control of the fuel cell temperature,this paper proposes a data-driven model-free algorithm based on the on classified replay twin delayed Bayesian deep deterministic policy gradient(CTDB-DDPG)to achieve the control of the fuel cell temperature system. Firstly,the use of deep deterministic policy gradient is proposed to solve the problem of intricate modeling of fuel cells.Then,the classification experience playback strategy is added to the algorithm,and the CTDB-DDPG algorithm uses two experience buffer pools to store the experience data.When constructing the network model,the average TD error of all samples in these two experience buffer pools is initialized to 0.Whenever new experience data is generated,the average TD errors of all experience data are first updated.If its TD error exceeds the mean value,it is stored in the empirical buffer pool I.Otherwise,it is stored in the empirical buffer pool Ⅱ.Classifying each experience sample's TD error helps better use the empirical data to train the network model.CTDB-DDPG considers the neural network's uncertainty by incorporating a Bayesian neural network into the algorithm,and the proposed Bootstrap with random initialization leads to a reasonable uncertainty estimation.At the beginning of each round or fixed interval during the learning process,unbiased hypotheses are obtained from the posterior distributions of the MDP parameters and estimated using a multi-head shared network Bootstrap value function,which does not require additional computational resources. Moreover,using Q-learning preserves the uncertainty of the cumulative discount,which is more effective for environments requiring deep exploration.Randomly selecting the head network and simulating Thompson sampling can effectively avoid ineffective boosting of intelligence in the noise strategy,accelerating the convergence of the CTDB-DDPG algorithm.In addition,the fuel cell thermal management system has a large inertia;the algorithm in this paper adds OU noise to the action to improve the exploration efficiency.OU noise is a temporary correlation noise extracted from the Ornstein-Uhlenbeck process,which helps the algorithm to better explore different strategies by generating temporal correlation noise.This exploration process can help the algorithm to find possible better strategies,thus improving the performance and efficiency of the algorithm.Although the addition of noise can cause the algorithm's performance to deteriorate in the short term,in the long term,the addition of noise can help the algorithm to avoid falling into a local optimum.It may help to find a better strategy. Finally,the algorithm's validity is verified on the simulation platform Simulink as well as the experimental platform RT-Lab,and similar conclusions are obtained,verifying the algorithm's effectiveness.However,although our CTDB-DDPG temperature control strategy has been validated on simulation and hardware-in-the-loop test platforms,more complex real-world working conditions,such as ambient temperature and humidity variations and equipment aging,will be considered in future studies to test and improve the adaptability and robustness of our algorithm in the broader range of more complex situations.

关键词

燃料电池/联合控制/深度确定性/贝叶斯网络

Key words

Fuel cell/joint control/deep reinforcement learning/Bayesian network

分类

信息技术与安全科学

引用本文复制引用

赵洪山,潘思潮,马利波,吴雨晨,吕廷彦..基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制[J].电工技术学报,2024,39(13):4240-4256,17.

电工技术学报

OA北大核心CSTPCD

ISSN：1000-6753

访问量0

下载量0

段落导航