基于改进深度双Q网络的移动机器人路径规划算法OA北大核心CSTPCD
Mobile Robot Path Planning Algorithm with Improved Deep Double Q Networks
针对传统的基于深度双Q学习网络(DDQN)的移动机器人路径规划方法在复杂未知环境中面临的搜索不彻底、收敛速度慢等问题,提出了一种改进的深度双Q网络学习算法(improved deep double Q-network,I-DDQN).首先,利用竞争网络结构对DDQN算法的值函数进行估计.然后,提出了一种基于双层控制器结构的机器人路径探索策略,其中上层控制器的价值函数用于移动机器人局部最优动作的探索,下层控制器的价值函数用于全局任务策略的学习;同时在算法学习过程中使用优先经验回放机制进行数据收集和采样,并使用小批量数据进行网络训练.最后,分别在OpenAI Gym和Gazebo两种不同的仿真环境下与传统的DDQN算法及其改进算法进行了对比分析.实验结果表明,所提的I-DDQN算法在两种仿真环境下的多种评价指标上都优于传统的DDQN算法及其改进算法,在相同复杂环境中能有效克服路径搜索不彻底、收敛速度慢等问题.
To solve the problems of the conventional mobile robot path planning method based on the deep double Q-network(DDQN),such as incomplete search and slow convergence,we propose an improved DDQN(I-DDQN)learning algorithm.First,the proposed I-DDQN algorithm uses the competitive network structure to estimate the value function of the DDQN algorithm.Second,we propose a robot path exploration strategy based on a two-layer controller structure,where the value function of the upper controller is used to explore the local optimal action of the mobile robot and the value function of the lower controller is used to learn the global task strategy.In addition,during algorithm learning,we use the priority experience playback mechanism for data collection and sampling and the small-batch data for network training.Finally,we perform a comparative analysis with the conventional DDQN algorithm and its improved algorithm in two different simulation environments,Open AI Gym and Gazebo.The experimental results show that the proposed I-DDQN al-gorithm is superior to the conventional DDQN algorithm and its improved algorithm in terms of vari-ous evaluation indicators in the two simulation environments and effectively overcomes the problems of incomplete path search and slow convergence speed in the same complex environment.
张磊;母亚双;潘泉
河南工业大学信息科学与工程学院,河南郑州 450001河南工业大学人工智能与大数据学院,河南郑州 450001西北工业大学自动化学院,陕西西安 710068
计算机与自动化
深度学习强化学习分层深度强化学习竞争网络结构机器人路径规划优先经验回放
deep learningreinforcement learninghierarchical deep reinforcement learningcompetitive network structurerobot path planningpriority experience playback
《信息与控制》 2024 (003)
365-376 / 12
国家自然科学基金青年基金项目(62006071);河南省重点研发与推广专项(科技攻关)项目(242102210016)
评论