计算机工程2025,Vol.51Issue(4):66-74,9.DOI:10.19678/j.issn.1000-3428.0069097
基于因果掩码的因果强化学习算法
Causal Reinforcement Learning Algorithm Based on Causal Mask
摘要
Abstract
Reinforcement Learning(RL)has become an important solution to sequential continuous decision-making problems,such as root cause localization of fault alarms;however,existing methods suffer from low sample efficiency and high exploration costs that hinder their wide application.Studies have shown that introducing causal knowledge offers great potential for improving decision interpretability and sampling efficiency of RL agents.However,most existing methods do not implicitly model causal relationships and fail to directly utilize the knowledge of causal structures.Therefore,this study proposes a two-stage causal RL algorithm,whereby the first stage explicitly models environmental variables using causal models based on observational data,and the second stage constructs causal masks based on the learned causal structure to augment policy,which helps narrow the decision space and reduce exploration risks.Considering the lack of public benchmark environments that allow direct causal reasoning,this study designs a root cause localization task in a simulated fault alarm environment and demonstrates the effectiveness and robustness of the proposed algorithm through comparative experiments in environments of different dimensions.The experimental results showed that in a low-dimensional environment,the proposed algorithm improved indicator of cumulative rewards by 13%with respect to the existing mainstream RL Soft Actor-Critic(SAC)algorithm,and in a high-dimensional environment by 79%,requiring only a few explorations for the policy to converge.The sample efficiency increased by 27%and 52%in low-and high-dimensional environments,respectively.关键词
强化学习/因果发现/因果强化学习/因果掩码/策略学习Key words
Reinforcement Learning(RL)/causal discovery/causal RL/causal mask/policy learning分类
计算机与自动化引用本文复制引用
黄思扬,蔡瑞初,乔杰,郝志峰..基于因果掩码的因果强化学习算法[J].计算机工程,2025,51(4):66-74,9.基金项目
国家自然科学基金(61876043,61976052,62206064) (61876043,61976052,62206064)
科技创新2030—"新一代人工智能"重大项目(2021ZD0111501) (2021ZD0111501)
国家优秀青年科学基金(62122022). (62122022)