空间控制技术与应用(中英文)2026,Vol.52Issue(1):111-120,10.DOI:10.3969/j.issn.1674-1579.2026.01.011
基于重叠ViT的特征增强遥感图像目标检测
Overlapped ViT-Based Feature-Enhanced Object Detection for Remote Sensing Images
摘要
Abstract
Remote sensing images are characterized by wide scene coverage,significant variations in object scales,complex and diverse backgrounds,as well as a large number of low-contrast small objects,which poses significant challenges for accurate detection.To address these issues,this paper proposes an object detection method for remote sensing images based on an overlapped ViT backbone with feature enhancement,named Overlapped Patches Vision Transformer Detection(OLP-ViTDet).Building on the ViT backbone,this method introduces an overlapping patch strategy to construct an overlapped ViT backbone that captures fine-grained features across patches.By incorporating additional overlapping image patches,it strengthens cross-patch feature correlations,thereby resolving the information fragmentation issue caused by non-overlapping patches in traditional ViT backbones.Combined with a simplified feature pyramid structure,it enhances the efficiency of multi-scale feature extraction and fusion.The sliding window attention mechanism is employed to reduce computational complexity while retaining the capability for global information interaction,effectively improving the detection accuracy of low-contrast small objects.Comparative experiments conducted on the DIOR dataset and the NWPU VHR-10 dataset show that the OLP-ViTDet algorithm achieves mean average precision(mAP)values of 78.8%and 96.4%,respectively,demonstrating significant advantages in the detection accuracy of small objects and objects with complex structures.This method substantially improves the accuracy of object detection in remote sensing images and offers new perspectives for object recognition in spatial tasks.关键词
遥感图像/目标检测/ViT/重叠分块/简化特征金字塔/滑动窗口注意力Key words
remote sensing images/object detection/ViT/overlapped patches/simple FPN/sliding-window attention分类
信息技术与安全科学引用本文复制引用
张晴,王养柱,邱华鑫,张小蔓,吴坤,李可..基于重叠ViT的特征增强遥感图像目标检测[J].空间控制技术与应用(中英文),2026,52(1):111-120,10.基金项目
国家自然科学基金(62173330)和航空科学基金(2024Z031051001) National Natural Science Foundation of China(62173330)and Aeronautical Science Foundation of China(2024Z031051001) (62173330)