| 注册
首页|期刊导航|自动化学报|面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法

面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法

何祥坤 赵洋 房建武 程洪 吕辰

自动化学报2025,Vol.51Issue(11):2473-2485,13.
自动化学报2025,Vol.51Issue(11):2473-2485,13.DOI:10.16383/j.aas.c250193

面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法

Toward Trustworthy Policy Optimization for Autonomous Driving:An Adversarial Robust Reinforcement Learning Approach

何祥坤 1赵洋 2房建武 3程洪 2吕辰4

作者信息

  • 1. 电子科技大学(深圳)高等研究院 深圳 518110中国
  • 2. 电子科技大学自动化工程学院 成都 611731中国
  • 3. 西安交通大学人工智能与机器人研究所 西安 710049中国
  • 4. 南洋理工大学机械与宇航工程学院 新加坡 639798新加坡
  • 折叠

摘要

Abstract

While reinforcement learning has achieved remarkable success in recent years,policy robustness remains one of the critical bottlenecks for its deployment in safety-critical autonomous driving domains.A fundamental challenge lies in the unpredictable environmental changes and unavoidable perception noises that many real-world autonomous driving tasks face.These uncertainties can lead the system to make suboptimal decision and control,potentially resulting in catastrophic consequences.To address the aforementioned multi-source uncertainty issues,we propose an adversarial robust reinforcement learning approach to achieve trustworthy end-to-end control policy optimization.First,we construct an online-learnable adversary model that simultaneously approximates the worst-case perturbations in both environmental dynamics and state observations.Second,the interaction between the autonomous driving agent and environmental dynamic perturbations is formulated as a zero-sum game to model their adversarial nature.Third,to address the simulated multi-source uncertainties,we propose a robust con-strained actor-critic algorithm that maximizes policy cumulative rewards in the continuous action space while effect-ively constraining the impact of perturbations in environmental dynamics and state observations on the learned end-to-end control policy.Finally,the proposed approach is evaluated across diverse scenarios,traffic flows and perturb-ation conditions,and is compared with three representative methods.The results validate its effectiveness and ro-bustness under complex working conditions and adversarial environments.

关键词

自动驾驶/智能交通/强化学习/可信人工智能

Key words

Autonomous driving/intelligent transportation/reinforcement learning/trustworthy artificial intelli-gence

引用本文复制引用

何祥坤,赵洋,房建武,程洪,吕辰..面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法[J].自动化学报,2025,51(11):2473-2485,13.

基金项目

国家自然科学基金(W2411052,62273057),电子科技大学高层次人才配套项目(A25007)资助Supported by National Natural Science Foundation of China(W2411052,62273057)and Talent Support Program of Uni-versity of Electronic Science and Technology of China(A25007) (W2411052,62273057)

自动化学报

OA北大核心

0254-4156

访问量0
|
下载量0
段落导航相关论文