|国家科技期刊平台
首页|期刊导航|铁道科学与工程学报|基于ASP-SAC算法的列车自动驾驶速度控制

基于ASP-SAC算法的列车自动驾驶速度控制OA北大核心CSTPCDEI

Automatic train operation speed control based on ASP-SAC algorithm

中文摘要英文摘要

随着经济建设的绿色转型以及人工智能的快速发展,城市轨道交通已成为居民日常出行的重要方式,在保障安全性、高效性和准点性的前提下,列车运行的节能性和舒适性需求也越来越被关注.合理的运行策略能够有效实现多种目标需求下的列车自动驾驶速度控制,强化学习作为一种智能决策方法,能够有效解决这一控制问题.首先,通过综合分析技术、安全性和乘客体验等方面的因素,基于专家经验动作划分和状态信息熵将软演员-评论家(SAC)改进为动作状态经验优先软演员-评论家(ASP-SAC)方法,用于研究列车自动驾驶速度控制问题.其次,将问题马尔可夫形式化,搭建了列车运行环境,确定了状态空间、动作空间以及基于目标控制的奖励函数.最后,以北京地铁亦庄线的一段区间数据为例进行试验,对ASP-SAC方法进行验证并与其他一些算法在相同环境下进行性能优劣比较.研究结果表明:该方法对于多目标控制需求下的列车自动驾驶速度控制问题具有可行性,与未改进前相比算法效率提高22.73%,与PPO算法相比提高29.17%,改进效果良好.同时,列车运行时在安全性、舒适性无误的情况下,准时性、精确性和节能性都强于SAC、DQN、PPO以及PID算法,其中能耗分别减少3.64%、5.62%、4.38%、7.35%,控制效果良好.此外,该方法亦具备鲁棒性,在列车自动驾驶速度控制方面具有一定的优越性和可参考性.

With the green transformation of economic development and the rapid advancement of artificial intelligence,urban rail transit has become an important mode of daily travel for residents.While ensuring safety,efficiency,and punctuality,the energy-saving and comfort demands of train operation have also attracted increasing attention.Reasonable operation strategies can effectively achieve automatic driving speed control of trains under multiple control requirements.Reinforcement learning,as an intelligent decision-making method,can effectively solve this control problem.Firstly,based on the comprehensive analysis of factors such as technology,safety,and passenger experience,the Soft Actor-Critic(SAC)algorithm was improved as the Action-State Experience Prioritized Soft Actor-Critic(ASP-SAC)method,using expert experience action segmentation and state information entropy to study the problem of automatic train operation speed control.Secondly,the problem was formalized as a Markov decision process.The train operation environment was established.The state space,action space,and reward function based on goal control were determined.Finally,using a section of data from the Beijing Subway Yizhuang Line as an example,the ASP-SAC method was validated and compared with other algorithms in the same environment.The research results show that the method is feasible for automatic train operation speed control under multiple target requirements,with an efficiency improvement of 22.73%compared to the unimproved algorithm,and a 29.17%improvement compared to the PPO algorithm.Additionally,the method outperforms SAC,DQN,PPO,and PID algorithms in timeliness,precision,and energy efficiency while ensuring safety and comfort during train operation,with energy consumption reduced by 3.64%,5.62%,4.38%,and 7.35%respectively,demonstrating good control effects.Furthermore,the method can possess robustness and has certain superiority and reference value in the aspect of automatic train operation speed control.

刘伯鸿;卢田

兰州交通大学 自动化与电气工程学院,甘肃 兰州 730070

交通运输

列车自动驾驶多目标控制强化学习ASP-SAC算法速度控制

automatic train operationmulti-objective controlreinforcement learningASP-SAC algorithmspeed control

《铁道科学与工程学报》 2024 (007)

2637-2648 / 12

国家自然科学基金资助项目(51967010)

10.19713/j.cnki.43-1423/u.T20231620

评论