计算机工程2024,Vol.50Issue(10):1-15,15.DOI:10.19678/j.issn.1000-3428.0070036
基于视觉-语言预训练模型的零样本迁移学习方法综述
Survey of Zero-Shot Transfer Learning Methods Based on Vision-Language Pre-Trained Models
摘要
Abstract
In recent years,remarkable advancements in Artificial Intelligence(AI)across unimodal domains,such as computer vision and Natural Language Processing(NLP),have highlighted the growing importance and necessity of multimodal learning.Among the emerging techniques,the Zero-Shot Transfer(ZST)method,based on visual-language pre-trained models,has garnered widespread attention from researchers worldwide.Owing to the robust generalization capabilities of pre-trained models,leveraging visual-language pre-trained models not only enhances the accuracy of zero-shot recognition tasks but also addresses certain zero-shot downstream tasks that are beyond the scope of conventional approaches.This review provides an overview of ZST methods based on vision-language pre-trained models.First,it introduces conventional approaches to Few-Shot Learning(FSL)and summarizes its main forms.It then discusses the distinctions between ZST and FSL based on vision-language pre-trained models,highlighting the new tasks that ZST can address.Subsequently,it explores the application of ZST methods in various downstream tasks,including sample recognition,object detection,semantic segmentation,and cross-modal generation.Finally,it analyzes the challenges of current ZST methods based on vision-language pre-trained models and outlines potential future research directions.关键词
零样本学习/视觉-语言预训练模型/零样本迁移/多模态/计算机视觉Key words
Zero-Shot Learning(ZSL)/vision-language pre-trained model/Zero-Shot Transfer(ZST)/multi-modal/computer vision分类
信息技术与安全科学引用本文复制引用
孙仁科,许靖昊,皇甫志宇,李仲年,许新征..基于视觉-语言预训练模型的零样本迁移学习方法综述[J].计算机工程,2024,50(10):1-15,15.基金项目
国家自然科学基金(61976217,62306320) (61976217,62306320)
江苏省自然科学基金(BK20231063). (BK20231063)