首页|期刊导航|计算机工程与应用|视觉Transformer在细粒度图像分类中的应用综述

视觉Transformer在细粒度图像分类中的应用综述

温世雄智敏

计算机工程与应用2025，Vol.61Issue(23)：24-37,14.

计算机工程与应用2025，Vol.61Issue(23)：24-37,14.DOI:10.3778/j.issn.1002-8331.2503-0014

视觉Transformer在细粒度图像分类中的应用综述

Survey of Vision Transformers for Fine-Grained Image Classification

温世雄 ¹智敏¹

作者信息

1. 内蒙古师范大学计算机科学技术学院,呼和浩特 010022
折叠

摘要

Abstract

Fine-grained image classification(FGIC)aims to identify subcategories that are visually highly similar yet exhibit subtle differences.With the rapid advancement of deep learning,FGIC algorithms have gradually evolved from traditional fully supervised learning to weakly supervised approaches.Vision Transformers(ViTs),leveraging multi-head self-attention mechanisms,eliminate the reliance on manual annotations and overcome the limitations of convolutional neural networks(CNNs)in terms of receptive field size and global modeling capacity,becoming one of the mainstream methods for this task.This paper first outlines the key characteristics and challenges of FGIC,and briefly introduces the architecture and advantages of ViT.Based on different feature fusion strategies,existing ViT-based improvements are categorized into hierarchical fusion,multi-local fusion,and multi-granularity fusion.The modifications of each category are illustrated in detail,and their underlying mechanisms are systematically analyzed and summarized.In addition,com-monly used public datasets are reviewed,and future research directions are proposed based on current limitations,aiming to further explore the potential of ViT in FGIC tasks.

关键词

细粒度图像分类(FGIC)/视觉Transformer(ViT)/特征融合

Key words

fine-grained image classification(FGIC)/vision Transformer(ViT)/feature fusion

分类

信息技术与安全科学

引用本文复制引用

温世雄,智敏..视觉Transformer在细粒度图像分类中的应用综述[J].计算机工程与应用,2025,61(23):24-37,14.

基金项目

内蒙古自然科学基金(2023MS06009) （2023MS06009）

内蒙古高等学校科学研究重点项目(NJZZ21004) （NJZZ21004）

呼和浩特市基础研究与应用基础研究项目(2024-规-基-33). （2024-规-基-33）

计算机工程与应用

OA北大核心

ISSN：1002-8331

访问量2

下载量0

段落导航