|国家科技期刊平台
首页|期刊导航|机器人|基于行为的多差速机器人强化学习任务监管器设计

基于行为的多差速机器人强化学习任务监管器设计OA北大核心CSTPCD

Reinforcement Learning Mission Supervisor Design for Behavior-based Differential Drive Robots

中文摘要英文摘要

针对多差速机器人系统提出了一种基于试错学习的多智能体强化学习任务监管器.此方法解决了基于行为的多智能体系统总是依赖人的智能设计切换规则以决策行为优先级的问题.首先,在零空间行为控制框架下引入了差速模型代替质点模型,首次推导了具有非完整约束的零空间行为控制范式,从而提升了系统对最小极值状态的鲁棒性.然后,首次将行为优先级切换问题建模为协作式马尔可夫博弈问题,学习了一个最优的联合策略以动态且智能地决策行为优先级,不仅避免了人工设计切换规则,而且降低了在线计算和存储负担.仿真结果显示,所提出多智能体强化学习任务监管器具有优越的行为优先级切换性能.在AgileX Limo系列多差速机器人系统上的成功应用,验证了该任务监管器的实用性.

A multi-agent reinforcement learning mission supervisor(MARLMS)is designed for differential drive robots using trial-and-error learning.The proposed MARLMS addresses the challenge inherent in behavior-based multi-agent sys-tems,wherein the design of switching rules to determine behavior priorities relies heavily on human intelligence.Building upon the null-space-based behavioral control(NSBC)framework,a differential model is introduced to replace the particle model.Consequently,a paradigm of NSBC with nonholonomic constraints is presented for the first time,enhancing the system robustness to the minimum extremum state.Subsequently,a joint policy is developed to dynamically and intelligent-ly determine behavior priorities by modeling the behavior priority switching problem as a cooperative Markov game.The proposed MARLMS not only eliminates the need for manual design of switching rules but also reduces the computational and storage burdens during online operations.Simulation results demonstrate the superior behavior priority switching perfor-mance of the proposed MARLMS.Furthermore,successful implementation on AgileX Limo robots validates the practicality of the proposed MARLMS.

张祯毅;黄捷

福州大学电气工程与自动化学院,福建福州 350108||福州大学5G+工业互联网研究院,福建福州 350108

差速机器人行为控制强化学习任务监管器智能决策

differential drive robotbehavioral controlreinforcement learningmission supervisorintelligent decision

《机器人》 2024 (004)

397-413,424 / 18

国家自然科学基金(92367109).

10.13973/j.cnki.robot.230148

评论