国防科技大学学报2025,Vol.47Issue(4):111-122,12.DOI:10.11887/j.issn.1001-2486.24120032
面向长序列自主作业的非对称Actor-Critic强化学习方法
Asymmetric Actor-Critic reinforcement learning for long-sequence autonomous manipulation
摘要
Abstract
Long-sequence autonomous manipulation capability becomes one of the bottlenecks hindering the practical application of intelligent robots.To address the diverse long-sequence operation skill requirements faced by robots in complex scenarios,an efficient and robust asymmetric Actor-Critic reinforcement learning method was proposed.This approach aims to solve the challenges of high learning difficulty and complex reward function design in long-sequence tasks.By integrating multiple Critic networks to collaboratively train a single Actor network,and introducing GAIL(generative adversarial imitation learning)to generate intrinsic rewards for the Critic network,the learning difficulty of long-sequence tasks was reduced.On this basis,a two-stage learning method was designed,utilizing imitation learning to provide high-quality pre-trained behavior policies for reinforcement learning,which not only improves learning efficiency but also enhances the generalization performance of the policy.Simulation results for long-sequence autonomous task execution in a chemical laboratory demonstrate that the proposed method significantly improves the learning efficiency of robot long-sequence skills and the robustness of behavior policies.关键词
自主作业机器人/强化学习/Actor-Critic/长序列操作Key words
autonomous manipulation robot/reinforcement learning/Actor-Critic/long-sequence operation分类
信息技术与安全科学引用本文复制引用
任君凯,瞿宇珂,罗嘉威,倪子淇,卢惠民,叶益聪..面向长序列自主作业的非对称Actor-Critic强化学习方法[J].国防科技大学学报,2025,47(4):111-122,12.基金项目
国家自然科学基金资助项目(62373201) (62373201)
国防科技大学自主创新科学基金资助项目(ZK2023-30,24-ZZCX-GZZ-11) (ZK2023-30,24-ZZCX-GZZ-11)