| 注册
首页|期刊导航|计算机工程|面向异构多背包问题的深度强化学习算法

面向异构多背包问题的深度强化学习算法

李斌 郭毅

计算机工程2026,Vol.52Issue(4):140-162,23.
计算机工程2026,Vol.52Issue(4):140-162,23.DOI:10.19678/j.issn.1000-3428.0070166

面向异构多背包问题的深度强化学习算法

Deep Reinforcement Learning Algorithms for Heterogeneous Multiple Knapsack Problems

李斌 1郭毅2

作者信息

  • 1. 福建理工大学机械与汽车工程学院,福建 福州 350118||福建理工大学福建省大数据挖掘与应用技术重点实验室,福建 福州 350118
  • 2. 福建理工大学福建省大数据挖掘与应用技术重点实验室,福建 福州 350118||福建理工大学计算机科学与数学学院,福建 福州 350118
  • 折叠

摘要

Abstract

By focusing on the traditional multi-Knapsack Problem(KP)in typical logistics system operations,this study abstracts a Heterogeneous Multiple Knapsack Problem(HMKP)and formulates an improved Deep Deterministic Policy Gradient(DDPG)algorithm to solve it.The DDPG algorithm tends to fall into a local optimum when solving the 0-1 KP.To address this issue,a Dynamic Randomization Mechanism(DRM)and Dynamic Penalty Mechanism(DPM)are adopted and embedded with an improved Transformer module to optimize the algorithm.Then,a Dynamic DDPG(TDP-DDPG)algorithm is proposed based on the improved Transformer module.First,a tabu list is added to prevent repeated searches.The TDP-DDPG algorithm demonstrates efficient search capability across several experimental algorithms,finding the ideal optimum in 39 classical algorithms in test sets 1 and 2 from low to high dimensionality,as well as in higher dimensionality test set 3 and in three out of six algorithms in large-scale test set 4.Experiments show that the TDP-DDPG algorithm has stronger optimization seeking ability after incorporating the improved strategy.Next,the BPD-DDPG algorithm is designed based on the TDP-DDPG algorithm to solve the HMKP with higher complexity and is analyzed and evaluated in high-dimensional arithmetic by combining several classical 0-1 KP examples.The results show that the BPD-DDPG algorithm is more accurate than Gurobi in three low-scale cases;however,the solution time is longer.The BPD-DDPG algorithm can efficiently solve high-dimension,large-scale HMKP at a low computational cost within an acceptable time.

关键词

深度强化学习/0-1背包问题/异构多背包问题/Transformer模块/动态惩罚机制/禁忌表

Key words

Deep Reinforcement Learning(DRL)/0-1 Knapsack Problem(KP)/Heterogeneous Multiple Knapsack Problem(HMKP)/Transformer module/dynamic penalty mechanism/tabu list

分类

信息技术与安全科学

引用本文复制引用

李斌,郭毅..面向异构多背包问题的深度强化学习算法[J].计算机工程,2026,52(4):140-162,23.

基金项目

教育部人文社会科学研究规划基金(19YJA630031). (19YJA630031)

计算机工程

1000-3428

访问量0
|
下载量0
段落导航相关论文