|国家科技期刊平台
首页|期刊导航|液晶与显示|基于CNN-Transformer结构的遥感影像变化检测

基于CNN-Transformer结构的遥感影像变化检测OA北大核心CSTPCD

Remote sensing image change detection based on CNN-Transformer structure

中文摘要英文摘要

现代高分辨率遥感图像变化检测借助卷积神经网络(Convolutional Neural Network,CNN)取得了显著成果.然而,卷积操作的感受野限制导致在学习全局上下文和远程空间关系方面存在不足.虽然视觉Transformer能有效捕获远程特征的依赖性,但其对影像变化细节的处理不足,导致空间定位能力有限且计算效率低下.为解决上述问题,本文提出了一种基于空间空洞金字塔池化的跨层级联线性融合端到端编解码混合CNN-Transformer的变化检测模型,兼具视觉Transformer和CNN的优势.首先,利用孪生CNN网络提取图像特征,并借助空洞金字塔池化模块对特征进行精细处理,从而更精准地捕获图像的细节特征信息.其次,将提取的特征转化为视觉单词,并通过Transformer编码器进行建模,以获取丰富的上下文信息.这些信息随后被反馈至视觉空间,通过Transformer解码器对原始特征进行强化,提升特征的表达效果.接着,采用跨层级联的方式将CNN提取的特征与Transformer编解码的特征进行融合,利用上采样技术联系不同分辨率的特征图,实现位置信息与语义信息的融合.最后,通过差异增强模块生成包含丰富变化信息的差异特征图.在LEVIR、CDD、DSIFN和WHUCD 4个公开遥感数据集上的广泛实验验证了本文方法的有效性.与其他先进方法相比,本文模型的分类性能更出色,有效改善了变化检测中的欠分割、过分割及边缘粗糙等问题.

Modern high-resolution remote sensing images have achieved remarkable results in change detection with the aid of convolutional neural network(CNN).However,the limited receptive field of convolution operations leads to insufficient learning of global context and long-distance spatial relationships.While visual Transformers effectively capture dependencies in remote features,their handling of details in image changes is insufficient,resulting in limited spatial localization capabilities and low computational efficiency.To address these issues,this paper proposes a multi-level cross-layer linear fusion end-to-end encoding-decoding hybrid CNN-Transformer change detection model based on dilated spatial pyramid pooling,combining the advantages of visual Transformers and CNN.Firstly,image features are extracted using Siamese CNN,refined through dilated pyramid pooling to better capture detailed feature information.Secondly,the extracted attributes are converted into visual words,and a Transformer encoder models the compact visual words,feeding the learned context-rich labels back into visual space through a Transformer decoder to reinforce the original features.Thirdly,CNN features are fused with the features from Transformer encoding-decoding through skip connections,facilitating the fusion of position and semantic information by connecting features of different resolutions through upsampling.Finally,a difference enhancement module generates difference feature maps containing rich change information.Comprehensive experiments conducted on four publicly accessible remote sensing datasets,including LEVIR,CDD,DSIFN and WHUCD,confirm the efficacy of the proposed approach.Compared with other cutting-edge techniques for detecting changes,the model presented in this paper achieves superior classification performance,effectively addressing issues such as under-segmentation,over-segmentation and rough edge segmentation in change detection results.

潘梦洋;杨航;范祥晖

中国科学院 长春光学精密机械与物理研究所,吉林 长春 130033||中国科学院大学,北京 100049中国科学院 长春光学精密机械与物理研究所,吉林 长春 130033

计算机与自动化

遥感图像变化检测卷积神经网络Transformer空间空洞金字塔池化

remote sensing imageschange detectionconvolutional neural networktransformeratrous spatial pyramid pooling

《液晶与显示》 2024 (010)

1361-1379 / 19

中国科学院青年创新促进会(No.2020220)Supported by Youth Innovation Promotion Association,Chinese Academy of Sciences(No.2020220)

10.37188/CJLCD.2024-0086

评论