自动化学报2025,Vol.51Issue(11):2473-2485,13.DOI:10.16383/j.aas.c250193
面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法
Toward Trustworthy Policy Optimization for Autonomous Driving:An Adversarial Robust Reinforcement Learning Approach
摘要
Abstract
While reinforcement learning has achieved remarkable success in recent years,policy robustness remains one of the critical bottlenecks for its deployment in safety-critical autonomous driving domains.A fundamental challenge lies in the unpredictable environmental changes and unavoidable perception noises that many real-world autonomous driving tasks face.These uncertainties can lead the system to make suboptimal decision and control,potentially resulting in catastrophic consequences.To address the aforementioned multi-source uncertainty issues,we propose an adversarial robust reinforcement learning approach to achieve trustworthy end-to-end control policy optimization.First,we construct an online-learnable adversary model that simultaneously approximates the worst-case perturbations in both environmental dynamics and state observations.Second,the interaction between the autonomous driving agent and environmental dynamic perturbations is formulated as a zero-sum game to model their adversarial nature.Third,to address the simulated multi-source uncertainties,we propose a robust con-strained actor-critic algorithm that maximizes policy cumulative rewards in the continuous action space while effect-ively constraining the impact of perturbations in environmental dynamics and state observations on the learned end-to-end control policy.Finally,the proposed approach is evaluated across diverse scenarios,traffic flows and perturb-ation conditions,and is compared with three representative methods.The results validate its effectiveness and ro-bustness under complex working conditions and adversarial environments.关键词
自动驾驶/智能交通/强化学习/可信人工智能Key words
Autonomous driving/intelligent transportation/reinforcement learning/trustworthy artificial intelli-gence引用本文复制引用
何祥坤,赵洋,房建武,程洪,吕辰..面向可信自动驾驶策略优化:一种对抗鲁棒强化学习方法[J].自动化学报,2025,51(11):2473-2485,13.基金项目
国家自然科学基金(W2411052,62273057),电子科技大学高层次人才配套项目(A25007)资助Supported by National Natural Science Foundation of China(W2411052,62273057)and Talent Support Program of Uni-versity of Electronic Science and Technology of China(A25007) (W2411052,62273057)