计算机工程2026,Vol.52Issue(4):140-162,23.DOI:10.19678/j.issn.1000-3428.0070166
面向异构多背包问题的深度强化学习算法
Deep Reinforcement Learning Algorithms for Heterogeneous Multiple Knapsack Problems
摘要
Abstract
By focusing on the traditional multi-Knapsack Problem(KP)in typical logistics system operations,this study abstracts a Heterogeneous Multiple Knapsack Problem(HMKP)and formulates an improved Deep Deterministic Policy Gradient(DDPG)algorithm to solve it.The DDPG algorithm tends to fall into a local optimum when solving the 0-1 KP.To address this issue,a Dynamic Randomization Mechanism(DRM)and Dynamic Penalty Mechanism(DPM)are adopted and embedded with an improved Transformer module to optimize the algorithm.Then,a Dynamic DDPG(TDP-DDPG)algorithm is proposed based on the improved Transformer module.First,a tabu list is added to prevent repeated searches.The TDP-DDPG algorithm demonstrates efficient search capability across several experimental algorithms,finding the ideal optimum in 39 classical algorithms in test sets 1 and 2 from low to high dimensionality,as well as in higher dimensionality test set 3 and in three out of six algorithms in large-scale test set 4.Experiments show that the TDP-DDPG algorithm has stronger optimization seeking ability after incorporating the improved strategy.Next,the BPD-DDPG algorithm is designed based on the TDP-DDPG algorithm to solve the HMKP with higher complexity and is analyzed and evaluated in high-dimensional arithmetic by combining several classical 0-1 KP examples.The results show that the BPD-DDPG algorithm is more accurate than Gurobi in three low-scale cases;however,the solution time is longer.The BPD-DDPG algorithm can efficiently solve high-dimension,large-scale HMKP at a low computational cost within an acceptable time.关键词
深度强化学习/0-1背包问题/异构多背包问题/Transformer模块/动态惩罚机制/禁忌表Key words
Deep Reinforcement Learning(DRL)/0-1 Knapsack Problem(KP)/Heterogeneous Multiple Knapsack Problem(HMKP)/Transformer module/dynamic penalty mechanism/tabu list分类
信息技术与安全科学引用本文复制引用
李斌,郭毅..面向异构多背包问题的深度强化学习算法[J].计算机工程,2026,52(4):140-162,23.基金项目
教育部人文社会科学研究规划基金(19YJA630031). (19YJA630031)