| 注册
首页|期刊导航|铁道科学与工程学报|基于深度强化学习的高速列车驾驶策略优化

基于深度强化学习的高速列车驾驶策略优化

徐凯 张皓桐 张淼 张洋 吴仕勋

铁道科学与工程学报2025,Vol.22Issue(1):25-37,13.
铁道科学与工程学报2025,Vol.22Issue(1):25-37,13.DOI:10.19713/j.cnki.43-1423/u.T20240432

基于深度强化学习的高速列车驾驶策略优化

Deep reinforcement learning for operation strategies optimization in high-speed trains

徐凯 1张皓桐 1张淼 2张洋 3吴仕勋1

作者信息

  • 1. 重庆交通大学 信息科学与工程学院,重庆 400074
  • 2. 中国铁道科学研究院集团有限公司,北京 100081
  • 3. 重庆中车长客轨道车辆有限公司,重庆 401133
  • 折叠

摘要

Abstract

Deep Reinforcement Learning(DRL)is among the most promising technologies for enhancing energy efficiency and operational quality of high-speed trains.However,there are still issues limiting its practical application.Existing solutions face two primary problems,which are DRL's performance suffers in handling large state spaces in high-speed train environments,and fixed reward functions struggle to adapt to energy efficiency variations during different scheduling times,leading to manual adjustments.To address the problems,this paper proposeed a Hierarchical Optimization Deep Reinforcement Learning(HODRL)algorithm for intelligent high-speed train driving strategies.The algorithm comprised a hierarchical optimization layer,which leveraged prior knowledge to reduce agent exploration complexity and adjusted the reward function for effective balance across multiple objectives.In the reinforcement learning layer,the Twin Delayed Deep Deterministic policy gradient(TD3)algorithm was employed to enhance train control accuracy in continuous action spaces.Simulation experiments confirm the HODRL algorithm's effectiveness.The algorithm reduces the invalid state space by 79.68%on average and allows the agent to obtain the correct reward signal.Compared with the actual performance of the agent,the mean error is 1.99 kWh and the variance is 0.91 kWh.The proposed algorithm only requires 15.26%training time of TD3 algorithm to converge,and compared with other baseline algorithms,when the time error is±0.1%and passenger comfort is guaranteed.The energy consumption of PPO,DDPG,TD3 and PMP algorithms is reduced by 1.29%,5.70%,1.69%and 3.27%,respectively.The results can offer valuable insights for optimizing high-speed safe train driving strategies.

关键词

高速列车/分层次优化/深度强化学习/状态空间约束/奖励重塑

Key words

high-speed trains/hierarchical optimization/deep reinforcement learning/state space constraints/reward reshaping

分类

交通工程

引用本文复制引用

徐凯,张皓桐,张淼,张洋,吴仕勋..基于深度强化学习的高速列车驾驶策略优化[J].铁道科学与工程学报,2025,22(1):25-37,13.

基金项目

重庆市自然科学基金资助项目(CSTB2024NSCQ-MSX0275,cstc2021jcyj-msxmX0017) (CSTB2024NSCQ-MSX0275,cstc2021jcyj-msxmX0017)

重庆交通大学研究生科研创新计划项目(CYS23515) (CYS23515)

铁道科学与工程学报

OA北大核心

1672-7029

访问量0
|
下载量0
段落导航相关论文