信息工程大学学报2026,Vol.27Issue(1):35-41,7.DOI:10.3969/j.issn.1671-0673.2026.01.005
改进双延迟深度确定性策略梯度的库存控制方法
Inventory Control Method Based on Improved Twin Delayed Deep Deterministic Policy Gradient
摘要
Abstract
To address the difficulty and high cost of inventory control in the environment of uncertain demand and supply delays,an inventory control method based on the improved twin delayed deep de-terministic policy gradient(TD3)is proposed.Firstly,the inventory control is abstracted as a Markov decision process with dual objectives of service level maximization and cost minimization,which serves as the training environment for the TD3 algorithm.Secondly,the prioritized experience replay mechanism is adopted to improve the sampling efficiency of the TD3 algorithm,and the long short-term memory(LSTM)is integrated into the multi-layer perceptron of the TD3 algorithm to optimize the net-work structure.Finally,the TD3 algorithm is used to interact with the environment to optimize both cost and service level in inventory control.Experimental results demonstrate that the inventory control cost of the proposed method is 22.2%lower than that of the original TD3 algorithm when the service level threshold is reached.关键词
库存控制/双延迟深度确定性策略梯度/优先经验回放/长短期记忆网络Key words
inventory control/TD3/prioritized experience replay/LSTM分类
信息技术与安全科学引用本文复制引用
龚永奇,郭基联,张亮,唐希浪..改进双延迟深度确定性策略梯度的库存控制方法[J].信息工程大学学报,2026,27(1):35-41,7.基金项目
国家自然科学基金(72201276) (72201276)