种子2026,Vol.45Issue(2):185-192,8.DOI:10.16590/j.cnki.1001-4705.2026.02.185
融合CNN与ViT模型对江南8种野菜识别分类
Identification and Classification of 8 Types of Wild-vegetables in the Jiangnan Region by Integrating CNN and ViT Models
摘要
Abstract
The traditional identification of wild vegetables mainly relies on manual experience,which has problems such as time-consuming,labor-intensive and misjudgment.Therefore,developing efficient and accurate identification algorithms has become a key issue.To address the problem of image recognition for ed-ible wild vegetables,the variant model BiFormer of the Vision Transformer(ViT)was improved.The resid-ual blocks of the dual convolutional layers from the traditional Convolutional Neural Network(CNN)repre-sentative model ResNet50 were introduced to enhance the local feature extraction ability.Dropout was added to the MLP layer to prevent overfitting,and the qk_dims parameter was optimized to improve the efficiency of attention modeling.Finally,an improved model named Res-BiFormer was constructed.On a 1 509 image datasets containing 8 types of wild vegetables from the Jiangnan region,the recognition accuracy of Res-Bi-Former reached 95.77%,which was 4.34%and 0.76%higher than that of the original BiFormer and ResNet50 respectively.On a large-scale dataset of 6 036 images after data augmentation,its accuracy further increased by 6.96%and 3.32%compared with the 2 benchmark models,fully verifying the good adaptability of the proposed model to different-sized datasets.Using the Grad-CAM++technique to generate heat maps for visual analysis of the model's decision-making process the results showed that Res-BiFormer could precise-ly focus on key features such as leaf vein textures and edge contours for the identification of wild vegetables.This research not only provided an efficient and feasible technical solution for the identification of edible wild vegetables,but also offered a reference for interpreting the decision-making mechanism of deep learning mod-els through the visualization analysis method.关键词
可食用野菜识别/Res-BiFormer/双卷积层残差块/Grad-CAM++Key words
edible wild-vegetable recognition/Res-BiFormer/double-convolution residual block/Grad-CAM++分类
农业科技引用本文复制引用
吴玉强,雷芷若,胡乃娟,吴育宝..融合CNN与ViT模型对江南8种野菜识别分类[J].种子,2026,45(2):185-192,8.基金项目
中央高校基本科研业务费专项资金项目(LGZD202504) (LGZD202504)
国家重点研发计划子课题(2023YFC330400502) (2023YFC330400502)
江苏高校"青蓝工程"资助(苏教师函[2025]4号) (苏教师函[2025]4号)