| 注册
首页|期刊导航|计算机科学与探索|面向遥感视觉问答的跨模态知识引入与提示推理框架

面向遥感视觉问答的跨模态知识引入与提示推理框架

董欣 俞鹏飞 顾晶晶

计算机科学与探索2026,Vol.20Issue(3):760-772,13.
计算机科学与探索2026,Vol.20Issue(3):760-772,13.DOI:10.3778/j.issn.1673-9418.2505064

面向遥感视觉问答的跨模态知识引入与提示推理框架

Cross-Modal Knowledge Introduction and Prompt Inference Framework for Remote Sensing Visual Question Answering

董欣 1俞鹏飞 1顾晶晶1

作者信息

  • 1. 南京航空航天大学 计算机科学与技术学院,南京 211106
  • 折叠

摘要

Abstract

With the rapid development of remote sensing technology,remote sensing visual question answering(RSVQA),as an emerging technology combining language and visual interaction,has effectively improved the efficiency of interpreting remote sensing image information and the interactive ability in fields such as earth observation and environmental monitoring.However,RSVQA still faces challenges such as high complexity of remote sensing image information,lack of remote sensing image-text alignment data,and diverse forms of text question expression.To address these challenges,this paper proposes a cross-modal knowledge introduction and prompt inference framework(CMKIP)for RSVQA.Specifically,for the high complexity of remote sensing images,CMKIP first builds a learnable image feature adapter for the large language model LLaMA to enable it to represent complex sensing images.Next,to address the problem of scarcity of remote sensing image-text alignment data,an automated data generation pipeline is constructed to generate high-quality image-text pairs from publicly available remote sensing datasets to realize efficient remote sensing domain knowledge injection.Finally,in view of the diversity of problem expressions,an innovative large and small model collaborative inference mechanism is proposed.This mechanism uses the small model to perform knowledge base retrieval and intermediate inference correction,effectively improving the understanding ability and reasoning accuracy of the large language model for diverse questions.In addition,CMKIP supports flexible replacement of small models according to task requirements and can be widely used in multiple downstream tasks in the remote sensing field.Experimental results show that CMKIP performs significantly better than existing methods on the RSVQA benchmark dataset,especially in low-sample scenarios,demonstrating its effectiveness and generalization in RSVQA tasks.

关键词

遥感视觉问答/大语言模型/跨模态扩展/遥感微调指令集/轻量级模型/提示推理

Key words

remote sensing visual question answering/large language model/cross-modal extension/remote sensing fine-tuning instruction set/light-weight model/prompt inference

分类

信息技术与安全科学

引用本文复制引用

董欣,俞鹏飞,顾晶晶..面向遥感视觉问答的跨模态知识引入与提示推理框架[J].计算机科学与探索,2026,20(3):760-772,13.

基金项目

国家自然科学基金(62072235).This work was supported by the National Natural Science Foundation of China(62072235). (62072235)

计算机科学与探索

1673-9418

访问量0
|
下载量0
段落导航相关论文