|国家科技期刊平台
首页|期刊导航|智慧农业(中英文)|基于生成对抗网络和视觉-语义对齐的零样本害虫识别方法

基于生成对抗网络和视觉-语义对齐的零样本害虫识别方法OACSTPCD

Zero-Shot Pest Identification Based on Generative Adversarial Networks and Visual-Semantic Alignment

中文摘要英文摘要

[目的/意义]害虫的精准识别对农作物虫害有效防治具有重大意义.然而,当前的害虫图像识别模型都是针对闭集数据构建的模型,难以对训练集中没有出现过的害虫种类(不可见害虫)进行推理,导致在实际应用过程中遇见不可见类别昆虫时误判现象尤为严重.[方法]针对这一问题,提出了一种适用零样本学习(Zero-Shot Learning,ZSL)和广义零样本学习(Generalized Zero-Shot Learning,GZSL)范式的害虫图像识别方法VSA-WGAN,可以实现对可见(训练集中包含的类别)与不可见害虫种类的辨识.该方法基于生成对抗网络(Genera-tive Adversarial Network,GAN)生成伪视觉特征,将零样本害虫识别问题转化为传统监督学习任务,且通过引入对比学习来优化生成器的生成质量,添加视觉-语义对齐模块进一步约束生成器,使其能生成更具判别性的特征.[结果与讨论]在自建的一个适用于零样本学习研究的害虫数据集和几个公开数据集对提出方法进行了评估.其中,在多个公开数据上取得了目前最优结果,相比之前方法最大提升达到2.8%;在自建20类害虫数据集上取得了零样本设置下77.4%的识别精度和广义零样本设置下78.3%的调和精度,相比之前方法分别提升了2.1%和1.2%.[结论]所提方法能有效地将害虫的视觉特征泛化到不可见类,实现害虫的零样本识别,有助于提升害虫识别模型的泛化能力,为农作物新虫害的发现与防治提供帮助.

[Objective]Accurate identification of insect pests is crucial for the effective prevention and control of crop infestations.However,ex-isting pest identification methods primarily rely on traditional machine learning or deep learning techniques that are trained on seen classes.These methods falter when they encounter unseen pest species not included in the training set,due to the absence of image samples.An innovative method was proposed to address the zero-shot recognition challenge for pests. [Methods]The novel zero-shot learning(ZSL)method proposed in this study was capable of identifying unseen pest species.First,a comprehensive pest image dataset was assembled,sourced from field photography conducted around Beijing over several years,and from web crawling.The final dataset consisted of 2 000 images across 20 classes of adult Lepidoptera insects,with 100 images per class.During data preprocessing,a semantic dataset was manually curated by defining attributes related to color,pattern,size,and shape for six parts:antennae,back,tail,legs,wings,and overall appearance.Each image was annotated to form a 65-dimensional attri-bute vector for each class,resulting in a 20×65 semantic attribute matrix with rows representing each class and columns representing attribute values.Subsequently,16 classes were designated as seen classes,and 4 as unseen classes.Next,a novel zero-shot pest recog-nition method was proposed,focusing on synthesizing high-quality pseudo-visual features aligned with semantic information using a generator.The wasserstein generative adversarial networks(WGAN)architecture was strategically employed as the fundamental net-work backbone.Conventional generative adversarial networks(GANs)have been known to suffer from training instabilities,mode collapse,and convergence issues,which can severely hinder their performance and applicability.The WGAN architecture addresses these inherent limitations through a principled reformulation of the objective function.In the proposed method,the contrastive module was designed to capture highly discriminative visual features that could effectively distinguish between different insect classes.It oper-ated by creating positive and negative pairs of instances within a batch.Positive pairs consisted of different views of the same class,while negative pairs were formed from instances belonging to different classes.The contrastive loss function encouraged the learned representations of positive pairs to be similar while pushing the representations of negative pairs apart.Tightly integrated with the WGAN structure,this module substantially improved the generation quality of the generator.Furthermore,the visual-semantic align-ment module enforced consistency constraints from both visual and semantic perspectives.This module constructed a cross-modal em-bedding space,mapping visual and semantic features via two projection layers:One for mapping visual features into the cross-modal space,and another for mapping semantic features.The visual projection layer took the synthesized pseudo-visual features from the generator as input,while the semantic projection layer ingested the class-level semantic vectors.Within this cross-modal embedding space,the module enforced two key constraints:Maximizing the similarity between same-class visual-semantic pairs and minimizing the similarity between different-class pairs.This was achieved through a carefully designed loss function that encourages the projected visual and semantic representations to be closely aligned for instances belonging to the same class,while pushing apart the representa-tions of different classes.The visual-semantic alignment module acted as a regularizer,preventing the generator from producing fea-tures that deviated from the desired semantic information.This regularization effect complemented the discriminative power gained from the contrastive module,resulting in a generator that produces high-quality,diverse,and semantically aligned pseudo-visual features. [Results and Discussions]The proposed method was evaluated on several popular ZSL benchmarks,including CUB,AWA,FLO,and SUN.The results demonstrated that the proposed method achieved state-of-the-art performance across these datasets,with a maximum improvement of 2.8%over the previous best method,CE-GZSL.This outcome fully demonstrated the method's broad effectiveness in different benchmarks and its outstanding generalization ability.On the self-constructed 20-class insect dataset,the method also exhibit-ed exceptional recognition accuracy.Under the standard ZSL setting,it achieved a precise recognition rate of 77.4%,outperforming CE-GZSL by 2.1%.Under the generalized ZSL setting,it achieved a harmonic mean accuracy of 78.3%,making a notable 1.2%im-provement.This metric provided a balanced assessment of the model's performance across seen and unseen classes,ensuring that high accuracy on unseen classes does not come at the cost of forgetting seen classes.These results on the pest dataset,coupled with the per-formance on public benchmarks,firmly validated the effectiveness of the proposed method. [Conclusions]The proposed zero-shot pest recognition method represents a step forward in addressing the challenges of pest manage-ment.It effectively generalized pest visual features to unseen classes,enabling zero-shot pest recognition.It can facilitate pests identi-fication tasks that lack training samples,thereby assisting in the discovery and prevention of novel crop pests.Future research will fo-cus on expanding the range of pest species to further enhance the model's practical applicability.

李天俊;杨信廷;陈晓;胡焕;周子洁;李文勇

上海海洋大学 信息学院,上海 201306,中国||北京市农林科学院信息技术研究中心,北京 100097,中国北京市农林科学院信息技术研究中心,北京 100097,中国北京市农林科学院信息技术研究中心,北京 100097,中国||吉林农业大学 信息技术学院,吉林长春 130118,中国

计算机与自动化

害虫识别语义知识图像特征生成对抗网络对比学习广义零样本学习

pest recognitionsemantic knowledgevisual featuresgenerative adversarial networkscontrastive learninggeneralized zero-shot learning

《智慧农业(中英文)》 2024 (002)

72-84 / 13

国家重点研发计划子课题(2022YFD2001801-2);北京市农林科学院院协同创新中心建设专项课题 National Key Technology Research and Development Program of China(2022YFD2001801-2);Promotion and In-novation of Beijing Academy of Agriculture and Forestry Sciences

10.12133/j.smartag.SA202312014

评论