|国家科技期刊平台
首页|期刊导航|自动化学报(英文版)|Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems

Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning SystemsOACSTPCDEI

Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems

英文摘要

Deep reinforcement learning(DRL)has demon-strated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However,due to the model's inherent uncertainty,rigorous vali-dation is requisite for its application in real-world tasks.Specific tests may reveal inadequacies in the performance of pre-trained DRL models,while the"black-box"nature of DRL poses a chal-lenge for testing model behavior.We propose a novel perfor-mance improvement framework based on probabilistic automata,which aims to proactively identify and correct critical vulnerabil-ities of DRL systems,so that the performance of DRL models in real tasks can be improved with minimal model modifications.First,a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units(PDMUs),and a reverse breadth-first search(BFS)method is used to identify the key PDMU-action pairs that have the greatest impact on adverse out-comes.This process relies only on the state-action sequence and final result of each trajectory.Then,under the key PDMU,we search for the new action that has the greatest impact on favor-able results.Finally,the key PDMU,undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms.Evaluations in two standard reinforce-ment learning environments and three actual job scheduling sce-narios confirmed the effectiveness of the method,providing cer-tain guarantees for the deployment of DRL models in real-world applications.

Min Yang;Guanjun Liu;Ziyuan Zhou;Jiacun Wang

Department of Computer Science,Tongji University,Shanghai 201804,ChinaComputer Science and Software Engineering Department,Monmouth University,West Long Branch,NJ 07764 USA

Deep reinforcement learning(DRL)performan-ce improvement frameworkprobabilistic automatareal-time moni-toringthe key probabilistic decision-making units(PDMU)-action pair

《自动化学报(英文版)》 2024 (011)

2327-2339 / 13

This work was supported by the Shanghai Science and Technology Committee(22511105500),the National Nature Science Found-ation of China(62172299,62032019),the Space Optoelectronic Measurement and Perception Laboratory,Beijing Institute of Control Engineering(LabSOMP-2023-03),and the Central Universities of China(2023-4-YB-05).

10.1109/JAS.2024.124818

评论