首页|期刊导航|运筹与管理|基于强化学习的连续泊位岸桥联合调度优化研究

基于强化学习的连续泊位岸桥联合调度优化研究OA北大核心CHSSCDCSSCICSTPCD

Study on Joint Scheduling Optimization of Continuous Berth and Quayside Bridge Based on Reinforcement Learning

中文摘要英文摘要

为了提高算法在大规模问题上的求解速度,提高集装箱码头的船舶周转速度.本文针对船舶泊位分配与岸桥调度都具有时序性,提出了一种包含状态、动作和奖励函数的马尔科夫决策过程的强化学习调度算法.在考虑泊位分配与岸桥数量调度问题的基础上,研究了同时决策泊位分配与岸桥调度,并考虑岸桥移动与具体岸桥编号分配的动态调度方法,建立了目标为船舶在港时间最短的连续泊位岸桥联合调度的数学模型.实验结果表明强化学习算法在大规模数据上求解速度明显比遗传算法和CPLEX快,解的质量也是相对优秀,证明了算法的有效性与优越性.为了改进该算法本文最后分析了强化学习算法的学习率、动作选择概率和折扣因子对结果的影响.

The continuous Berth Allocation and Quay Crane Assignment Problem(BACAP)is a critical challenge in port operations,primarily due to the traditional separation of berth allocation and quay crane scheduling.Historically,these processes have been treated as independent entities,leading to operational inefficiency and suboptimal performance.When berth allocation decisions are made without taking into account the allocation and scheduling of quay cranes,ports may experience delays,increased turnaround times for vessels,and an overall decline in productivity.This issue becomes increasingly pronounced in contexts with high vessel traffic and complex operational demands,where the need for a cohesive strategy is paramount. In this paper,we propose an innovative approach that builds upon the foundational framework of the contin-uous BACAP.Our methodology integrates both quay crane scheduling and berth allocation into a unified model.By recognizing the interdependencies between these two processes,we underscore the necessity of simultaneous decision-making,which serves to enhance port performance significantly.This integrated approach is designed to streamline operations,reduce delays,and ultimately improve the efficiency of port activities. To address the challenges associated with large-scale instances of this problem,we focus on reframing berth-quay joint scheduling as a simultaneous decision-making process.This involves not only determining the optimal docking location for each vessel but also defining the sequence of services provided by quay cranes.This dual focus is instrumental in facilitating a more efficient operational framework,particularly in environments character-ized by high levels of complexity and demand. In our research,we transform the scheduling problem into a Markov Decision Process(MDP).This trans-formation allows us to develop a reinforcement learning(RL)scheduling algorithm that encapsulates essential components such as state representation,action selection,and a well-structured reward function.The RL algorithm is adept at making informed decisions regarding both berth allocation and quay crane scheduling,thereby enabling the derivation of relatively optimal solutions within a reasonable timeframe.This innovative application of reinforcement learning not only simplifies complex decision-making processes but also enhances the adaptability of the model across varying operational scenarios. A pivotal aspect of our research is the establishment of a mathematical model specifically tailored for the continuous berth-quay-bridge joint scheduling problem.The model's primary objective is to minimize the total time vessels spend in port,a key performance indicator for evaluating port efficiency.Our experimental results indicate that the reinforcement learning algorithm significantly outperforms traditional methods,such as genetic algorithms and CPLEX,especially in scenarios involving extensive datasets.The algorithm demonstrates a considerable reduction in computational time while yielding solutions of comparable or superior quality.These findings substantiate the effectiveness and superiority of our approach in addressing the complexities inherent in port operations. Furthermore,to enhance the performance of the reinforcement learning algorithm,we conduct a comprehen-sive analysis of various parameters,including the learning rate,action selection probability,and discount factor.By systematically investigating the influence of these factors on algorithm performance,we aim to fine-tune our approach,ensuring its robustness and adaptability in diverse operational contexts.This meticulous tuning process is critical for optimizing the efficiency of our RL algorithm and ensuring that it can handle the dynamic and often unpredictable nature of port operations. Through our research,we contribute valuable insights into the integration of advanced computational tech-niques within maritime operations.By demonstrating the potential of a cohesive approach to berth allocation and quay crane scheduling,we pave the way for future studies that could refine and expand upon our findings.The implications of this research extend beyond mere operational efficiency;they also present opportunities for enhan-cing the sustainability of port operations by reducing turnaround times and minimizing the environmental impact of maritime activities. In conclusion,our work serves as a critical step in addressing the complexities of the continuous BACAP.The integration of quay crane scheduling with berth allocation through a reinforcement learning framework not only improves operational efficiency but also provides a robust model that can adapt to various scenarios within the maritime industry.As the demand for port services continues to grow,we believe that our approach can significantly contribute to the development of smarter,more efficient,and more sustainable port operations in the future.This paper thus not only enhances our understanding of the BACAP but also offers a pathway for further research that could explore the full potential of integrated decision-making processes in maritime logistics.

邓涵毅;梁承姬;SHI Jian;王钰;GINO LIM

上海海事大学物流科学与工程研究院,上海 201306上海海事大学物流科学与工程研究院,上海 201306||休斯敦大学工业工程系,得克萨斯休斯敦77204休斯敦大学工程技术系,得克萨斯休斯敦77204上海海事大学物流科学与工程研究院,上海 201306休斯敦大学工业工程系,得克萨斯休斯敦77204

交通运输

集装箱港口泊位与岸桥联合调度马尔科夫决策过程强化学习

container portberths combined with quay bridgesMarkov decision processreinforcement learning

《运筹与管理》 2024 (9)

15-21,7

国家重点研发计划资助项目(2019YFB1704403)国家自然科学基金资助项目(71972128)上海市"科技创新行动计划"软科学研究项目(22692111200)

10.12005/orms.2024.0279

评论