| 注册
首页|期刊导航|数据采集与处理|基于多任务强化学习的地形自适应模仿学习方法

基于多任务强化学习的地形自适应模仿学习方法

余昊 梁宇宸 张驰 刘跃虎

数据采集与处理2024,Vol.39Issue(5):1182-1191,10.
数据采集与处理2024,Vol.39Issue(5):1182-1191,10.DOI:10.16337/j.1004-9037.2024.05.010

基于多任务强化学习的地形自适应模仿学习方法

Terrain-Adaptive Motion Imitation Based on Multi-task Reinforcement Learning

余昊 1梁宇宸 2张驰 2刘跃虎2

作者信息

  • 1. 西安交通大学软件学院,西安 710049
  • 2. 西安交通大学人工智能学院,西安 710049
  • 折叠

摘要

Abstract

Terrain adaptive ability is the basis for the stable movement of agents under complex terrain conditions.Due to the complexity of the dynamical systems of these agents,such as humanoid robots,it is usually difficult for traditional inverse dynamics methods to have such ability.Recent research has used the advantages of reinforcement learning in solving sequential decision-making problems to train agents to adapt to terrain.However,these single-task learning methods cannot effectively learn the correlation in various terrains.In fact,complex terrain adaptive tasks can be considered as a multi-task problem,and the relationship between sub-tasks can be measured by different terrain factors.And then,the problem of incomplete acquisition of data distribution information can be solved by mutual learning of sub-task models.Therefore,this paper proposes a multi-task reinforcement learning method.It contains an execution layer which is consist of pre-trained subtask models and a decision layer based on reinforcement learning method.Moreover,the decision layer uses soft constraints to fuse models of the execution layer.Experiments on LeggedGym terrain simulator prove that the agent trained by the method in this paper is more stable in movement and has fewer falls down on complex terrains,showing better generalization performance.

关键词

多任务学习/模仿学习/强化学习/地形影响因素/LeggedGym地形仿真器

Key words

multi-task learning/learning by imitation/reinforcement learning/terrain influencing factor/LeggedGym terrain simulator

分类

信息技术与安全科学

引用本文复制引用

余昊,梁宇宸,张驰,刘跃虎..基于多任务强化学习的地形自适应模仿学习方法[J].数据采集与处理,2024,39(5):1182-1191,10.

基金项目

科技创新2030"新一代人工智能"重大项目(2018AAA0102504). (2018AAA0102504)

数据采集与处理

OA北大核心CSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文