农业工程学报2024,Vol.40Issue(14):23-32,10.DOI:10.11975/j.issn.1002-6819.202401145
基于深度强化学习的收割机省内协同调度优化策略
Deep reinforcement learning-based optimization strategy for the cooperative scheduling of harvesters
摘要
Abstract
Agricultural machinery dispatching operations,as an innovative model of socialized agricultural machinery services,have been widely implemented in county-or district-level administrative areas across the province.Due to the similar crop maturity periods within the same province,the demand for agricultural machinery is concentrated during peak operation periods,leading to a supply-demand imbalance where some machinery owners have no work while others have jobs but no machinery.This not only affects agricultural production efficiency but also increases the difficulty and complexity of agricultural production due to the lack of scientific and rational scheduling strategies.To address this challenge,many studies have adopted traditional heuristic algorithms to optimize the scheduling of harvester cooperative operations.However,issues such as low work efficiency and high operational costs remain.In response to these challenges,this study constructs a harvester co-scheduling model aimed at minimizing the transfer costs between fields.An inter-regional collaborative optimization scheduling algorithm based on deep reinforcement learning(DRL-ICOSA)was designed.This paper first analyzes the Markov decision process of harvester collaborative scheduling operations and constructs a deep reinforcement learning environment.For the attention mechanism,a policy network and a value network based on the encoder-decoder architecture were designed to enable the model to automatically learn the complexity of the environment,effectively utilize raw data,and improve performance and generalization capabilities.Dynamic Gaussian noise was introduced into the random sampling strategy to prevent the policy network from falling into local optima during the initial training stage while enhancing the model's performance and robustness.The model was effectively trained using a proximal policy optimization algorithm.Finally,the trained model was validated on a farmland test set,and the optimal path was selected using a greedy action selection strategy,resulting in an optimized solution for cross-county harvester scheduling.To verify the algorithm's effectiveness,four combined operation scenarios were considered,based on effective operation durations of 40 and 24 h,with the agricultural machinery scheduling center located at the center and edge of the operation area,respectively.The DRL-ICOSA algorithm,genetic algorithm(GA),particle swarm optimization(PSO),and simulated annealing(SA)were used to calculate scheduling strategies and conduct comparative analysis.The experimental results indicate that:when the scheduling center is located at the center or edge of the area and the effective operation duration is 40h,the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 13.9%compared to the GA,PSO,and SA algorithms;when the effective operation duration is 24 h,the average scheduling cost reduction is no less than 11.5%.when the operation duration is 40 or 24 h and the scheduling center is located at the center of the area,the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 12.3%compared to the GA,PSO,and SA algorithms;when the scheduling center is located at the edge of the area,the average scheduling cost reduction is no less than 11.5%for the DRL-ICOSA algorithm compared to the GA,PSO,and SA algorithms.Therefore,regardless of the effective operation duration or the geographical location of the scheduling center,the DRL-ICOSA algorithm consistently achieves the lowest scheduling cost compared to the other three algorithms.In summary,this study provides a more scientific and reasonable scheduling solution for the complex problem of collaborative harvester scheduling operations.The DRL-ICOSA algorithm demonstrates outstanding effectiveness in reducing scheduling costs and shows significant potential and application value in addressing the optimization problem of collaborative harvester scheduling.Compared with traditional heuristic algorithms,the proposed method is more suitable for complex environments and possesses stronger generalization capabilities.It avoids the manual feature design steps of traditional methods,thereby reducing dependence on prior knowledge.This study can effectively reduce resource waste and cost expenditure.关键词
农业机械/优化算法/路径规划/深度强化学习/协同调度Key words
agricultural machinery/optimization algorithm/path planning/deep reinforcement learning/cooperative scheduling分类
农业科技引用本文复制引用
李子康,张璠,滕桂法,李政,王梓怡,马世纪..基于深度强化学习的收割机省内协同调度优化策略[J].农业工程学报,2024,40(14):23-32,10.基金项目
河北省重点研发项目(21327407D) (21327407D)
河北省高等学校科学研究项目(QN2023062) (QN2023062)
河北省自然科学基金项目(C2023204069) (C2023204069)