深圳大学学报(理工版)2024,Vol.41Issue(3):274-282,9.DOI:10.3724/SP.J.1249.2024.03274
基于博弈论与强化学习的多智能体路径规划算法
A multi-agent path planning algorithm based on game theory and reinforcement learning
摘要
Abstract
This study addresses the issue of slow speed and low efficiency in path planning algorithms for multi-agents on a plane.We formulate the multi-agent path planning problem as a non-zero-sum stochastic game and employ the multi-agent reinforcement learning algorithm,win or learn fast-policy hill-climbing(WoLF-PHC),to obtain a Nash equilibrium strategy to make conflict free optimal path decisions for each agent.Additionally,we propose a fast adaptive WoLF-PHC(FA-WoLF-PHC)algorithm to improve the WoLF-PHC algorithm by constructing an objective function and using gradient descent method to adaptively update the learning rate.FA-WoLF-PHC is applied in two game scenarios:coin flipping and custom payoff matrix,and compared with policy hill-climbing(PHC)and WoLF-PHC algorithms.The experimental data show that the FA-WoLF-PHC algorithm improves the learning speed compared to the WoLF-PHC and effectively reduces oscillations in the learning process of WoLF-PHC and PHC algorithms.In the multi-agent path planning problems,the FA-WoLF-PHC algorithm improves the learning speed by 16.01%compared to WoLF-PHC algorithm.When the environmental grid map is expanded to 6×6 and the number of agents is increased to 3,the average times required for the FA-WoLF-PHC,WoLF-PSP and PP-AP Sarsa(Physarum Polycephalum-Artificial Potential Sarsa)algorithms are 16.30,20.59,and 17.72 seconds,respectively,to learn the final strategies over 10 experiments.The results show that the FA-WoLF-PHC algorithm can obtain Nash equilibrium strategies for each agent,and it effectively improves learning speed compared to the WoLF-PSP and PP-AP Sarsa algorithms.The FA-WoLF-PHC algorithm can quickly learn Nash strategies in common game scenarios and generate conflict-free optimal paths for each agent in multi-agent path planning problems,showing significant improvement in learning speed compared with other algorithms.关键词
人工智能/博弈论/动态规划/纳什均衡策略/强化学习/多智能体路径规划Key words
artificial intelligence/game theory/dynamic programming/Nash equilibrium strategy/reinforcement learning/multi-agent path planning分类
信息技术与安全科学引用本文复制引用
熊文博,郭磊,焦彤宇..基于博弈论与强化学习的多智能体路径规划算法[J].深圳大学学报(理工版),2024,41(3):274-282,9.基金项目
National Natural Science Foundation of China(61105103) 国家自然科学基金资助项目(61105103) (61105103)