首页|期刊导航|农业工程学报|多模态引导视觉Transformer的小样本农作物病害识别

多模态引导视觉Transformer的小样本农作物病害识别

杨森冯全阎文博周文伟杨婉霞

农业工程学报2025，Vol.41Issue(6)：195-203,9.

农业工程学报2025，Vol.41Issue(6)：195-203,9.DOI:10.11975/j.issn.1002-6819.202409189

多模态引导视觉Transformer的小样本农作物病害识别

Recognizing few-shot crop diseases using multimodal-guided visual Transformer

杨森 ¹冯全 ¹阎文博 ¹周文伟 ¹杨婉霞¹

作者信息

1. 甘肃农业大学机电工程学院,兰州 730070
折叠

摘要

Abstract

Accurate identification of plant diseases can play a crucial role in plant protection and the intelligent agriculture.In response to the scarce data,the few-shot learning(FSL)can provide a potential solution to identify crop diseases.However,the existing FSL has relied only on the low-level image features for the disease recognition.The correlations between multimodal data cannot be considered with small samples.In this study,a multimodal few-shot learning(MMFSL)model was proposed and then applied into the crop disease identification in a low data scenario.Three components were consisted of the FSL image,the text branch and the image-text comparative learning module.Firstly,the vision Transformer(ViT)was introduced into the FSL image branch of the MMFSL,instead of the conventional convolutional neural network(CNN)encoder.Simultaneously,the ViT was effectively enhanced to extract from the few-shot image features.The input samples were segmented into the small patches,in order to establish the semantic correspondences between local regions of the image.Secondly,the text branch was developed using a pre-trained language model.The labelled text information was extracted to guide the FSL branch.Some features were selected from the image categories.A hand-crafted cue template was created to incorporate the class labels as the input text information for the text header.As such,the text branch was bridged the gap between the pre-trained model and the actual task.Finally,the image-text comparison module was developed using a bilinear metric function,in order to align the semantic images and text.The network parameters were optimally updated to facilitate the cross-modal information learning and fusion using model-agnostic meta-learning(MAML)model.A series of comparative experiments were also conducted on the MMFSL model using Plantvillage and self-constructed dataset from the field scenario.The experimental results show that the average accuracy of 86.97％and 96.33％were achieved in the MMFSL model on the Plantvillage,respectively,under the 5way-1shot and 5way-5shot settings.Once the MMFSL model on the Plantvillage was transferred to the complex field scenarios,the average accuracy of 56.78％and 74.49％were still maintained for the 5way-1shot and 5way-5shot tasks,respectively.Compared with the mainstream FSL models,including MAML,Matching Net,Prototypical Network,DeepEMD,DeepBDC,and FewTURE,the MMFSL model was achieved in the highest accuracy of classification,indicating the particularly superior performance in 5way-1shot tasks.A comparison was made on the four encoders-ViT-Tiny,Swin-Tiny,Deit-Small,and Deit-Tiny.The Swin-Tiny was the most effective to extract the feature information from images,with the average accuracy of 84.20％and 95.53％under the 5way-1shot and 5way-5shot settings,respectively.The image-text metric function was also optimized using experiments.The bilinear metric also exhibited the superior performance,compared with two conventional metrics,namely cosine and dot product.The ablation test further demonstrated that the average accuracy of the MMFSL model increased by 2.77 and 0.80 percentage points,respectively,compared with the unimodal FSL model.In summary,the MMFSL model shared the high accuracy of disease recognition and excellent robustness in both laboratory and field scenarios.The textual information was incorporated to effectively alleviate the feature representation that caused by the scarcity of image samples.The MMFSL can be expected to serve as the viable and promising approach for the plant disease recognition in low-data scenarios.

关键词

病害/识别/小样本/多模态/视觉Transformer/文本信息

Key words

disease/recognition/few-shot/multimodal/visual Transformer/textual information

分类

农业工程

引用本文复制引用

杨森,冯全,阎文博,周文伟,杨婉霞..多模态引导视觉Transformer的小样本农作物病害识别[J].农业工程学报,2025,41(6):195-203,9.

基金项目

国家自然科学基金项目(32201663,32160421) （32201663,32160421）

农业工程学报

OA北大核心

ISSN：1002-6819

访问量0

下载量0

段落导航