| 注册
首页|期刊导航|运筹与管理|基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

刘浩 张静文 陈志 李恒

运筹与管理2025,Vol.34Issue(12):100-106,7.
运筹与管理2025,Vol.34Issue(12):100-106,7.DOI:10.12005/orms.2025.0381

基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

Optimization of Project Scheduling with Limited Construction Site in a Carbon Trading Scheme Based on Deep Reinforcement Learning

刘浩 1张静文 1陈志 1李恒1

作者信息

  • 1. 西北工业大学 管理学院,陕西 西安 710072
  • 折叠

摘要

Abstract

Heavy machinery used in construction projects generates significant carbon emissions.The carbon trading scheme aims to reduce these emissions through market mechanism.This paper proposes a Project Schedu-ling Problem with a Limited Construction Site in a Carbon Trading Scheme(PSPLCS-CTS).The objective is to minimize the total project cost,including the carbon trading cost.We assume construction machinery can operate at different speeds,leading to varying carbon emissions and activity durations.Upon project completion,if actual carbon emissions exceed the allocated quota,the excess emissions must be purchased additionally;conversely,any surplus quota can be sold. Based on the above analysis,we construct an integer programming model for PSPLCS-CTS.Then,the integer programming model is transformed into a Markov Decision Process(MDP)model.We design five key components of the MDP model according to the problem's characteristics:decision points,states,actions,state transition equations and reward function. We develop a two-stage algorithm(Double DQN-LS)that combines Double Deep Q-Network and local search to solve the MDP model.In the first stage,the agent interacts with the environment to generate experiences,which are stored in a replay buffer and then randomly sampled for training.The state and action information are converted into a matrix and input to the network,where convolutional layers automatically extract features,and the Q-value of the state-action pair is estimated.In addition,to reduce the overestimation of the target value,the evaluation network is used to select the action during the learning process,and the target network is used to estimate its Q-value to improve the stability and performance of the algorithm.In the second stage,two local search algorithms are employed to enhance the quality of the schedule produced by the Double DQN. Finally,extensive computational experiments are conducted to verify the effectiveness of the algorithm.For each set of instances,a sample is randomly selected for training at the level of each characteristic parameter.The trained Double DQN is then used to solve other new instances,and the two local search algorithms are used to refine the schedules generated by the Double DQN.The experimental results show the proposed Double DQN-LS algorithm outperforms the Genetic Algorithm(GA)and Estimation of Distribution Algorithm(EDA)on instances with larger sizes.Furthermore,the Double DQN-LS algorithm demonstrates a significant advantage in solving efficiency on all instances,with an average solving time of only about 6%of that of GA and 12%of that of EDA.

关键词

碳交易/项目调度/马尔可夫决策过程/深度强化学习

Key words

carbon trading/project scheduling/Markov decision processes/deep reinforcement learning

分类

管理科学

引用本文复制引用

刘浩,张静文,陈志,李恒..基于深度强化学习的碳交易机制下施工场地受限型项目调度优化[J].运筹与管理,2025,34(12):100-106,7.

基金项目

国家自然科学基金资助项目(71971173,72201209) (71971173,72201209)

西北工业大学博士论文创新基金项目(SOMBC202203,CX2023069) (SOMBC202203,CX2023069)

陕西省自然科学基础研究计划(2025JC-YBMS-800) (2025JC-YBMS-800)

运筹与管理

OA北大核心CHSSCDCSCDCSSCICSTPCD

1007-3221

访问量0
|
下载量0
段落导航相关论文