| 注册
首页|期刊导航|计算机工程|基于因果掩码的因果强化学习算法

基于因果掩码的因果强化学习算法

黄思扬 蔡瑞初 乔杰 郝志峰

计算机工程2025,Vol.51Issue(4):66-74,9.
计算机工程2025,Vol.51Issue(4):66-74,9.DOI:10.19678/j.issn.1000-3428.0069097

基于因果掩码的因果强化学习算法

Causal Reinforcement Learning Algorithm Based on Causal Mask

黄思扬 1蔡瑞初 1乔杰 1郝志峰2

作者信息

  • 1. 广东工业大学计算机学院,广东 广州 510006
  • 2. 汕头大学理学院,广东汕头 515063
  • 折叠

摘要

Abstract

Reinforcement Learning(RL)has become an important solution to sequential continuous decision-making problems,such as root cause localization of fault alarms;however,existing methods suffer from low sample efficiency and high exploration costs that hinder their wide application.Studies have shown that introducing causal knowledge offers great potential for improving decision interpretability and sampling efficiency of RL agents.However,most existing methods do not implicitly model causal relationships and fail to directly utilize the knowledge of causal structures.Therefore,this study proposes a two-stage causal RL algorithm,whereby the first stage explicitly models environmental variables using causal models based on observational data,and the second stage constructs causal masks based on the learned causal structure to augment policy,which helps narrow the decision space and reduce exploration risks.Considering the lack of public benchmark environments that allow direct causal reasoning,this study designs a root cause localization task in a simulated fault alarm environment and demonstrates the effectiveness and robustness of the proposed algorithm through comparative experiments in environments of different dimensions.The experimental results showed that in a low-dimensional environment,the proposed algorithm improved indicator of cumulative rewards by 13%with respect to the existing mainstream RL Soft Actor-Critic(SAC)algorithm,and in a high-dimensional environment by 79%,requiring only a few explorations for the policy to converge.The sample efficiency increased by 27%and 52%in low-and high-dimensional environments,respectively.

关键词

强化学习/因果发现/因果强化学习/因果掩码/策略学习

Key words

Reinforcement Learning(RL)/causal discovery/causal RL/causal mask/policy learning

分类

计算机与自动化

引用本文复制引用

黄思扬,蔡瑞初,乔杰,郝志峰..基于因果掩码的因果强化学习算法[J].计算机工程,2025,51(4):66-74,9.

基金项目

国家自然科学基金(61876043,61976052,62206064) (61876043,61976052,62206064)

科技创新2030—"新一代人工智能"重大项目(2021ZD0111501) (2021ZD0111501)

国家优秀青年科学基金(62122022). (62122022)

计算机工程

OA北大核心

1000-3428

访问量0
|
下载量0
段落导航相关论文