首页|期刊导航|重庆理工大学学报|结合图像-文本提示与跨模态适配器的零样本多标签图像分类

结合图像-文本提示与跨模态适配器的零样本多标签图像分类

宋铁成黄宇

重庆理工大学学报2024，Vol.38Issue(23)：182-188,7.

重庆理工大学学报2024，Vol.38Issue(23)：182-188,7.DOI:10.3969/j.issn.1674-8425(z).2024.12.022

结合图像-文本提示与跨模态适配器的零样本多标签图像分类

Zero-shot multi-label image classification with image-text prompts and cross-modal adapter

宋铁成 ¹黄宇¹

作者信息

1. 重庆邮电大学通信与信息工程学院,重庆 400065
折叠

摘要

Abstract

Recent approaches to zero-shot multi-label image classification primarily rely on the vision and language pre-training model CLIP.However,they only improve text prompts and ignore the interaction between image and text modalities.To address these problems,we propose a zero-shot multi-label image classification method combining image-text prompts and cross-modal adapter(ITPCA)to fully exploit the image matching ability of vision and language pre-training model.By combining prompt learning to design prompts for image and text branches,the generalization ability of the model to different labels is improved.Additionally,a cross-modal adapter is designed to build connections between the image and text modalities.Our experimental results show our method is better compared with zero-shot multi-label image classification methods on NUS-WIDE and MS-COCO multi-label datasets.

关键词

视觉语言预训练模型/提示学习/零样本学习/多标签图像分类

Key words

vision and language pre-training model/prompt learning/zero-shot learning/multi-label image classification

分类

信息技术与安全科学

引用本文复制引用

宋铁成,黄宇..结合图像-文本提示与跨模态适配器的零样本多标签图像分类[J].重庆理工大学学报,2024,38(23):182-188,7.

基金项目

国家自然科学基金面上项目(62371084) （62371084）

重庆理工大学学报

OA北大核心

ISSN：1674-8425

访问量0

下载量0

段落导航