首页|期刊导航|信息工程大学学报|改进双延迟深度确定性策略梯度的库存控制方法

改进双延迟深度确定性策略梯度的库存控制方法

龚永奇郭基联张亮唐希浪

信息工程大学学报2026，Vol.27Issue(1)：35-41,7.

信息工程大学学报2026，Vol.27Issue(1)：35-41,7.DOI:10.3969/j.issn.1671-0673.2026.01.005

改进双延迟深度确定性策略梯度的库存控制方法

Inventory Control Method Based on Improved Twin Delayed Deep Deterministic Policy Gradient

龚永奇 ¹郭基联 ¹张亮 ¹唐希浪¹

作者信息

1. 空军工程大学,陕西西安,710038
折叠

摘要

Abstract

To address the difficulty and high cost of inventory control in the environment of uncertain demand and supply delays,an inventory control method based on the improved twin delayed deep de-terministic policy gradient(TD3)is proposed.Firstly,the inventory control is abstracted as a Markov decision process with dual objectives of service level maximization and cost minimization,which serves as the training environment for the TD3 algorithm.Secondly,the prioritized experience replay mechanism is adopted to improve the sampling efficiency of the TD3 algorithm,and the long short-term memory(LSTM)is integrated into the multi-layer perceptron of the TD3 algorithm to optimize the net-work structure.Finally,the TD3 algorithm is used to interact with the environment to optimize both cost and service level in inventory control.Experimental results demonstrate that the inventory control cost of the proposed method is 22.2%lower than that of the original TD3 algorithm when the service level threshold is reached.

关键词

库存控制/双延迟深度确定性策略梯度/优先经验回放/长短期记忆网络

Key words

inventory control/TD3/prioritized experience replay/LSTM

分类

信息技术与安全科学

引用本文复制引用

龚永奇,郭基联,张亮,唐希浪..改进双延迟深度确定性策略梯度的库存控制方法[J].信息工程大学学报,2026,27(1):35-41,7.

基金项目

国家自然科学基金(72201276) （72201276）

信息工程大学学报

ISSN：1671-0673

访问量1

下载量0

段落导航