重庆理工大学学报2024,Vol.38Issue(23):182-188,7.DOI:10.3969/j.issn.1674-8425(z).2024.12.022
结合图像-文本提示与跨模态适配器的零样本多标签图像分类
Zero-shot multi-label image classification with image-text prompts and cross-modal adapter
摘要
Abstract
Recent approaches to zero-shot multi-label image classification primarily rely on the vision and language pre-training model CLIP.However,they only improve text prompts and ignore the interaction between image and text modalities.To address these problems,we propose a zero-shot multi-label image classification method combining image-text prompts and cross-modal adapter(ITPCA)to fully exploit the image matching ability of vision and language pre-training model.By combining prompt learning to design prompts for image and text branches,the generalization ability of the model to different labels is improved.Additionally,a cross-modal adapter is designed to build connections between the image and text modalities.Our experimental results show our method is better compared with zero-shot multi-label image classification methods on NUS-WIDE and MS-COCO multi-label datasets.关键词
视觉语言预训练模型/提示学习/零样本学习/多标签图像分类Key words
vision and language pre-training model/prompt learning/zero-shot learning/multi-label image classification分类
信息技术与安全科学引用本文复制引用
宋铁成,黄宇..结合图像-文本提示与跨模态适配器的零样本多标签图像分类[J].重庆理工大学学报,2024,38(23):182-188,7.基金项目
国家自然科学基金面上项目(62371084) (62371084)