|国家科技期刊平台
首页|期刊导航|工程科学学报|基于深度强化学习算法的纯电动矿用汽车再生制动策略研究

基于深度强化学习算法的纯电动矿用汽车再生制动策略研究OACSTPCD

Regenerative braking strategy based on deep reinforcement learning for an electric mining truck

中文摘要英文摘要

以载重 50t纯电动矿用汽车为研究对象,提出了一种基于深度强化学习优化算法的再生制动回馈策略.首先建立了纯电动矿用自卸车的数学模型.随后提出了一种考虑载重和坡度变化的基于自动熵调节Soft actor-critic(SAC)和深度确定性策略梯度算法(DDPG)的能量管理策略.其中车速、加速度、车辆质量与道路坡度、动力电池荷电状态(SOC)及充放电倍率作为状态变量;变速箱挡位作为动作变量;动力电池SOC及电池寿命作为奖励函数.仿真结果表明,基于动态规划的控制策略和所提出的基于SAC算法与基于DDPG算法的优化控制策略回馈效率分别提高了 18.15%、17.18%和 16.63%,电池寿命分别提升了 57.31%、56.87%和 57.38%.最后通过比较两种基于深度强化学习算法策略的奖励曲线,可以看出与基于DDPG算法的控制策略相比,所提出的基于SAC的能量管理控制策略的收敛速度提升了 166.7%.

With the promotion of national"carbon neutral"and"green mine"strategies,pure electric mining vehicles are crucial in promoting energy conservation and emission reduction in the mining industry.However,"mileage anxiety"is the primary problem limiting their promotion and application.Regenerative braking is an essential technology for improving energy efficiency and reducing the life-cycle costs of pure electric vehicles.However,because of harsh driving conditions and substantial changes in load capacity and road slope,the scale and fluctuation characteristics of energy demand vary sharply during operation,affecting the feedback efficiency and battery life of an electric mining dump truck.Therefore,designing reasonable regenerative braking strategies for pure electric mining dump trucks is crucial.This paper uses a 50-ton pure electric mining truck as the research object and proposes a regenerative braking feedback strategy based on the deep reinforcement learning optimization algorithm.First,a mathematical model of a pure electric mining dump truck was established,which included a permanent magnet synchronous motor,power battery,four-speed automated mechanical transmission,and vehicle longitudinal dynamic model.Furthermore,power performance verification based on the Matlab/Simulink simulation platform was performed.Subsequently,an energy management strategy was proposed based on the soft actor-critic(SAC)algorithm and the deep deterministic strategy gradient(DDPG)deep reinforcement learning algorithm considering load and slope changes.In particular,the state variables include vehicle speed,acceleration,vehicle mass,road slope,battery state of charge(SOC),and battery charge-discharge rate.The transmission gear is selected as the action variable of the proposed strategy.Battery SOC and battery lifetime are used as reward functions.Furthermore,an automatic entropy adjustment mechanism is introduced to improve the adaptability of the proposed control strategy to different operating conditions.Simulation results show that compared to the rule-based control strategy,the energy efficiency of the control strategy based on dynamic programming and the proposed optimization control strategy based on the SAC and DDPG algorithms are improved by 18.15%,17.18%,and 16.63%,respectively,and the battery lifetime is improved by 57.31%,56.87%,and 57.38%,respect ively.Finally,the proposed energy management strategy is compared with the control strategy based on DDPG to further verify its superiority by comparing the reward curves.The results demonstrate the feasibility of the proposed control strategy based on the SAC algorithm,which has improved convergence speed by 166.7%.

杨威威;罗登昊;张文明

北京科技大学机械工程学院车辆工程系,北京 100083

金属材料

纯电动矿用汽车再生制动电池寿命控制策略强化学习

pure electric mining truckregenerative brakingbattery lifecontrol strategyreinforcement learning

《工程科学学报》 2024 (003)

503-513 / 11

广东省基础与应用基础研究基金资助项目(2021A1515110195)

10.13374/j.issn2095-9389.2023.06.01.003

评论