上海航天(中英文)2026,Vol.43Issue(1):74-81,8.DOI:10.19328/j.cnki.2096-8655.2026.01.007
基于双分支注意力增强Mamba模型的遥感图像字幕生成方法
A Remote Sensing Image Captioning Method Based on the Dual-branch Attention Enhanced Mamba Model
摘要
Abstract
Remote sensing image captioning(RSIC)is a task that combines computer vision and natural language processing,aiming to convert remote sensing images into natural language descriptions.In this paper,an image captioning method based on dual-branch attention and Mamba is proposed.In the dual-branch attention Mamba network,a bidirectional scanning Mamba module is designed.The latest Mamba architecture is adopted to encode global image features,and a bidirectional scanning mechanism is used to enhance the model's spatial perception and understanding of the image space.In the dual-branch attention module,a lightweight attention mechanism is used to effectively focus on and optimize local image features,thereby improving the overall model performance.Tests on image captioning based on the UCM-Captions dataset and Sydney-Captions dataset show that the method proposed in this paper performs better than existing methods.关键词
遥感图像字幕生成(RSIC)/Mamba模型/通道注意力/空间注意力Key words
remote sensing image captioning(RSIC)/Mamba model/channel attention/spatial attention分类
信息技术与安全科学引用本文复制引用
王鹏,周凯立,祝好,王幸运,杜君..基于双分支注意力增强Mamba模型的遥感图像字幕生成方法[J].上海航天(中英文),2026,43(1):74-81,8.基金项目
国家自然科学基金资助项目(61801211) (61801211)
卫星遥感数字化应用创新重点实验室开放课题资助项目(LRSAI-2025008) (LRSAI-2025008)
上海航天科技创新基金资助项目(SAST2024-052) (SAST2024-052)
广东省基础与应用基础研究基金资助项目(2025A1515010258) (2025A1515010258)
深圳市科技计划资助项目(JCYJ20240813180005007) (JCYJ20240813180005007)