摘要
Abstract
A multi-scale feature fusion lightweight network MSA ResNet is proposed to address the increasing complexity and redundant fea-ture extraction of existing remote sensing image scene classification models.Firstly,by combining the classic ResNet model with Swin Trans-former technology,the sliding window multi head self attention mechanism,multi-level feature stacking,and cross layer connection strategy are utilized to enhance the model's understanding of spatial relationships and its ability to fuse multi-scale features;Secondly,simplifying the convolution module within the residual blocks of the network and introducing label smoothing strategies significantly reduce the number of mod-el parameters,maintain feature extraction efficiency,and improve overall classification performance.Experiments on the AID dataset show that the overall classification accuracy of the proposed model is 92.97%,which is 7.39%,1.14%,0.82%,and 0.65%higher than networks such as MSCP,SAFF,APDC,and PSGAN,respectively.This confirms that the multi-scale fusion method can improve the model's feature extraction ability,feature performance,and classification accuracy when the backbone feature extraction network has limited capabilities.关键词
遥感影像/场景分类/多头自注意力/多尺度特征融合Key words
remote sensing imagery/scene classification/multi-head self-attention/multi-scale feature fusion分类
计算机与自动化