首页|期刊导航|运筹与管理|基于深度强化学习的云订单动态接受与调度问题研究

基于深度强化学习的云订单动态接受与调度问题研究

丁祥海张梦钗刘春来韩杰

运筹与管理2024，Vol.33Issue(9)：221-226,6.

运筹与管理2024，Vol.33Issue(9)：221-226,6.DOI:10.12005/orms.2024.0309

基于深度强化学习的云订单动态接受与调度问题研究

Research on Cloud Order Dynamic Acceptance and Scheduling Based on Deep Reinforcement Learning

丁祥海 ¹张梦钗 ¹刘春来 ¹韩杰¹

作者信息

1. 杭州电子科技大学管理学院,浙江杭州 310018
折叠

摘要

Abstract

Cloud manufacturing is a new intelligent manufacturing model which uses network and service platform to provide all kinds of on-demand manufacturing services for customer needs.The main characteristics of cloud manufacturing can be summarized as customer-centric,service uncertainty,and service on demand.After production enterprises participate in cloud manufacturing,there are two types of orders:established existing orders and dynamic arrival cloud orders. In the cloud manufacturing environment,the OAS problem with the flexible flow shop as the processing environment is described as follows:After the platform sends the cloud order to the enterprise with surplus capacity,the enterprise needs to choose whether to accept the order and complete the production arrangement under the premise of producing the existing order.Order arrival follows Poisson distribution,and each order includes quantity,price,delivery time,machining part number and other information.If a cloud order is dynam-ically distributed to the enterprise,the enterprise needs to combine the production information of the cloud order,the production situation of the workshop and the arrival of future orders,and determine the collection of accepted orders and the production and processing arrangement,so as to maximize the total profit of the enterprise. Based on flexible flow shop,an improved DQN algorithm is proposed to solve the problem with order accept-ance and scheduling in cloud platform dynamic order dispatching.The single agent aims at maximum profit,and the single agent aims at minimum delay time and minimum disturbance.Since the objective functions of the two agents are different,they are non-homogeneous agents,so each agent adopts an independent DQN algorithm,and a dynamic interaction mechanism is established between agents.After the cloud order arrives,the receiving agent chooses to accept or reject the order and transmits the accepted order information to the placing agent.After trying different scheduling rules,the scheduling agent finds the optimal scheduling strategy through the feedback obtained by the reward function.The DQN network structure is improved in the scheduling agent,which increases the number of rules for selecting work piece and machine to 50,and further designs the process candidate set and machine candidate set combining the critical path,and the algorithm improvement strategy such as the earliest start of the process. The improved DQN algorithm is compared with the heuristic rule,Q-learning algorithm and DQN algorithm.The numerical experiments show that the improved algorithm is stable and superior to other algorithms in terms of maximum and average profit under different delay penalty factors,with higher order acceptance rate and balanced machine load.When the number of cloud orders increases,the worst solution of the improved algorithm is also better than the other algorithms.This shows the effectiveness of the improved algorithm.The scheduling strategy of the agent can optimize the scheduling of dynamically arrived cloud orders,improve the resource utilization rate of the workshop while producing existing orders normally,and improve the profit and order acceptance rate of enterprises.According to the research,most heuristic rules are short-sighted,but they have better performance when combined with DQN algorithm.Different rules are applicable to different scheduling targets and production environments.When deciding whether to accept cloud orders,DQN algorithm,after continuous learning,chooses appropriate scheduling rules and utilizes improved strategies,and can make each rule adjust order acceptance strategy and scheduling strategy in a short time,reduce workshop disturbance,and reduce the impact of delay penalty cost on profits,so as to ensure that the enterprise can obtain the maximum profit.

关键词

订单接受/动态决策/深度强化学习/柔性流水车间

Key words

order acceptance/dynamic scheduling/reinforcement learning/flexible flow shop

分类

管理科学

引用本文复制引用

丁祥海,张梦钗,刘春来,韩杰..基于深度强化学习的云订单动态接受与调度问题研究[J].运筹与管理,2024,33(9):221-226,6.

基金项目

国家自然科学基金资助项目(71901084) （71901084）

浙江省自然科学基金项目(LQ19G020010) （LQ19G020010）

浙江省属高校基本科研业务费专项资金项目(GK199900299012-210) （GK199900299012-210）

教育部人文社会科学研究基金项目(19YJC630099) （19YJC630099）

运筹与管理

OA北大核心CHSSCDCSSCICSTPCD

ISSN：1007-3221

访问量0

下载量0

段落导航