| 注册
首页|期刊导航|机器人|基于视觉语言模型的多模态无人机跨视图地理定位

基于视觉语言模型的多模态无人机跨视图地理定位

陈鹏 陈旭 罗文 林斌

机器人2025,Vol.47Issue(3):416-426,11.
机器人2025,Vol.47Issue(3):416-426,11.DOI:10.13973/j.cnki.robot.240283

基于视觉语言模型的多模态无人机跨视图地理定位

Multimodal Drone Cross-view Geo-localization Based on Vision-language Model

陈鹏 1陈旭 1罗文 1林斌1

作者信息

  • 1. 河北工业大学,天津 300401
  • 折叠

摘要

Abstract

Cross-view geo-localization for drones achieves autonomous positioning by matching onboard images with geo-referenced images in satellite-denied conditions,with the primary challenge lying in the significant appearance differences across cross-view images.Existing methods predominantly focus on local feature extraction while lacking in-depth explo-ration of contextual correlations and global semantics.To address this problem,a vision-language model based multimodal drone cross-view geo-localization framework is proposed in this paper.Leveraging the CLIP(contrastive language-image pre-training)model,a view text description generation module is constructed,which utilizes image-level visual concepts learned from large-scale datasets as external knowledge to guide the feature extraction process.A hybrid vision transformer(ViT)architecture is adopted as the backbone network,enabling the model to simultaneously capture local features and global contextual characteristics during image feature extraction.Furthermore,a mutual learning loss supervised by logic score-normalized Kullback-Leibler(KL)divergence is introduced to optimize the training process,in order to enhance the model ability to learn inter-view correlations.Experimental results demonstrate that under the guidance of text descrip-tions generated by the CLIP model,the proposed model learns deep semantic information more effectively,thereby better addressing challenges such as viewpoint variations and temporal discrepancies encountered in cross-view geo-localization.

关键词

跨视图地理定位/视觉语言模型/多模态/图像匹配/无人机

Key words

cross-view geo-localization/vision-language model/multimodal/image matching/drone

引用本文复制引用

陈鹏,陈旭,罗文,林斌..基于视觉语言模型的多模态无人机跨视图地理定位[J].机器人,2025,47(3):416-426,11.

基金项目

国家自然科学基金(U20A20201). (U20A20201)

机器人

OA北大核心

1002-0446

访问量1
|
下载量0
段落导航相关论文