计算机工程与应用2024,Vol.60Issue(10):30-46,17.DOI:10.3778/j.issn.1002-8331.2310-0395
细粒度图像分类上Vision Transformer的发展综述
Survey of Vision Transformer in Fine-Grained Image Classification
摘要
Abstract
Fine-grained image classification(FGIC)has always been an important problem in computer vision.Compared to traditional image classification tasks,FGIC faces the challenge of extremely similar inter-class objects,which further increases the difficulty of the task.With the development of deep learning,Vision Transformer(ViT)models have become popular in the field of vision and have been introduced into FGIC tasks.This paper introduces the challenges faced by FGIC tasks,provides an overview of the ViT model,and analyzes its characteristics.The comprehensive review is primarily based on the model structure and covers FGIC algorithms based on ViT.It includes feature extraction,feature relation modeling,feature attention,and feature enhancement as the main aspects.Each algorithm is summarized,and its advantages and disadvantages are analyzed.Following that,a comparison of the performance of different ViT models on the same public dataset is conducted to validate their effectiveness in the FGIC tasks.Furthermore,the limitations of current research are pointed out,and future research directions are proposed to further explore the potential of ViT in FGIC.关键词
细粒度图像分类/Vision Transformer/特征提取/特征关系构建/特征注意/特征增强Key words
fine-grained image classification/Vision Transformer/feature extraction/feature relation modeling/feature attention/feature enhancement分类
信息技术与安全科学引用本文复制引用
孙露露,刘建平,王健,邢嘉璐,张越,王晨阳..细粒度图像分类上Vision Transformer的发展综述[J].计算机工程与应用,2024,60(10):30-46,17.基金项目
宁夏重点研发计划(引才专项)(2022BSB03044) (引才专项)
宁夏自然科学基金(2021AAC03205) (2021AAC03205)
北方民族大学科研启动金项目(2020KYQD37) (2020KYQD37)
北方民族大学研究生创新项目(YCX23168). (YCX23168)