电子科技2025,Vol.38Issue(9):20-25,6.DOI:10.16180/j.cnki.issn1007-7820.2025.09.003
基于交叉融合编码器的Transformer图像特征提取网络
Cross-Fusion Encoder-Based Transformer Feature Extraction Network
摘要
Abstract
In view of the problems that window-based vision Transformer is easy to destroy fine-grained fea-tures and large number of model parameters,this study proposes a cross-fusion encoder based Transformer image fea-ture extraction network.Two feature subsets are obtained using image channel feature correlation consistency stripping feature maps.Two attention modules are connected in parallel perform attention calculations respectively to obtain lo-cal and global information.A crossover mechanism is adopted to fuse information.Combined with the inter-window attention module of CAT Transformer,an in-window attention mode between channel dimensions of feature graph is designed to avoid destroying texture information and enhance the representation ability of local features.Experimental results show that the proposed model achieves 79.86%TOP-1 accuracy with 7.8 MB parameter on CIFAR-100 data set and 80.7%accuracy on ImageNet-1K data set.Grad-CAM(Gradient-weighted Class Activation Mapping)is al-so used to visualize the decision-making process.关键词
计算机视觉/图像分类/自注意力机制/特征提取/上下文信息/编码器/通道特征/卷积神经网络Key words
computer vision/image classification/self-attention/feature extraction/contextual information/en-coder/channel feature/convolutional neural network分类
信息技术与安全科学引用本文复制引用
龚宇,吴鹏..基于交叉融合编码器的Transformer图像特征提取网络[J].电子科技,2025,38(9):20-25,6.基金项目
浙江省自然科学基金(LY21F010016)Natural Science Foundation of Zhejiang(LY21F010016) (LY21F010016)