首页|期刊导航|华南理工大学学报（自然科学版）|基于多模态场景记忆与指令提示的目标导航方法

基于多模态场景记忆与指令提示的目标导航方法

董敏赖酉城毕盛

华南理工大学学报（自然科学版）2026，Vol.54Issue(2)：1-15,15.

华南理工大学学报（自然科学版）2026，Vol.54Issue(2)：1-15,15.DOI:10.12141/j.issn.1000-565X.250152

基于多模态场景记忆与指令提示的目标导航方法

Target Navigation Method Based on Multimodal Scene Memory and Instruction Prompting

董敏 ¹赖酉城 ¹毕盛¹

作者信息

1. 华南理工大学计算机科学与工程学院,广东广州 510006
折叠

摘要

Abstract

Target navigation requires robots to autonomously plan paths and accurately reach specified target loca-tions based on natural language instructions or object categories in a working environment.Existing approaches to this task primarily fall into two categories in a working environmrnt:end-to-end learning and planning-based methods.While end-to-end methods can directly learn a mapping from perception to action,they often exhibit limited generalization capability and poor interpretability.Conversely,planning-based methods offer better generalization and interpretability to some extent;however,they are often not optimized for known environments,fail to exploit prompt information embedded in natural language instructions,struggle to achieve precise docking at a specified distance from the target,and generally suffer from low execution efficiency.To overcome these limitations,this paper proposed a novel target navigation method named MEMO-Nav,which leverages multimodal scene memory and instruction prompting to improve navigation performance in known environments.The proposed framework adopts a hierarchical architecture:a high-level planning layer maintains a multimodal scene memory to record envi-ronmental information and utilizes a Large Language Model(LLM)to parse target and prompt information from natu-ral language instructions.This information is then combined to enable efficient waypoint selection and navigation planning.A low-level execution layer handles fundamental navigation functions,including robot localization and movement,and integrates an object detection model with a depth camera to achieve accurate target positioning.Together,these two layers form a complete target navigation system,ultimately enabling the robot to locate the target and dock at a specified distance based on natural language instructions.Extensive experiments conducted on the GAZEBO simulation platform and in real-world settings demonstrate that the proposed method significantly outper-forms existing approaches in known environments across key metrics,including navigation efficiency,success rate,and docking distance accuracy.In summary,the proposed method offers a feasible,efficient,interpretable,and pre-cise solution for mobile robot target navigation in practical scenarios.

关键词

移动机器人/目标导航/路径规划/大语言模型/多模态

Key words

mobile robot/target navigation/path planning/large language model/multimodal

分类

信息技术与安全科学

引用本文复制引用

董敏,赖酉城,毕盛..基于多模态场景记忆与指令提示的目标导航方法[J].华南理工大学学报（自然科学版）,2026,54(2):1-15,15.

基金项目

广东省自然科学基金项目(2022B1515020015)Supported by the Natural Science Foundation of Guangdong Province(2022B1515020015) （2022B1515020015）

华南理工大学学报（自然科学版）

ISSN：1000-565X

访问量0

下载量0

段落导航