首页|期刊导航|计算机应用研究|面向视觉-语言模型的递进互提示学习

面向视觉-语言模型的递进互提示学习

陶俊杰张卫锋王玉霞缪翌徐领

计算机应用研究2025，Vol.42Issue(6)：1648-1655,8.

计算机应用研究2025，Vol.42Issue(6)：1648-1655,8.DOI:10.19734/j.issn.1001-3695.2024.10.0446

面向视觉-语言模型的递进互提示学习

ProgCoPL:progressive co-prompting learning for vision-language models

陶俊杰 ¹张卫锋 ²王玉霞 ³缪翌 ¹徐领¹

作者信息

1. 浙江理工大学计算机科学与技术学院(人工智能学院),杭州 310000
2. 浙江理工大学计算机科学与技术学院(人工智能学院),杭州 310000||嘉兴大学信息科学与工程学院,浙江嘉兴 314000
3. 嘉兴市计量检定测试院,浙江嘉兴 314000
折叠

摘要

Abstract

The large-scale pre-trained vision-language model CLIP aligns images and texts in a shared semantic space,demon-strating robust generalization capabilities across diverse downstream tasks.However,existing prompt learning methods often in-dependently insert learnable prompt vectors into each layer of CLIP's visual and text encoders.This approach results in limi-ted cross-modal interaction,with independent prompts across layers failing to effectively guide the encoders in capturing task-relevant information.To address these issues,this paper proposed ProgCoPL.This method introduced text-guided prompt vec-tors into the visual encoder layers and vision-guided prompt vectors into the text encoder layers,thereby enhancing cross-modal interaction and alignment.Furthermore,ProgCoPL incorporated information transmission channels between prompt vectors across layers,enabling hierarchical and progressive integration of task-specific information.Experiments on 11 datasets show that ProgCoPL efficiently adapts CLIP to downstream tasks,significantly improving its cross-dataset generalization ability.ProgCoPL outperforms existing methods in multiple generalization tests,particularly achieving notable advancements in cross-dataset scenarios.

关键词

多模态/提示学习/视觉-语言模型/Transformer编码器

Key words

multimodal/prompt learning/vision-language model/Transformer encoder

分类

信息技术与安全科学

引用本文复制引用

陶俊杰,张卫锋,王玉霞,缪翌,徐领..面向视觉-语言模型的递进互提示学习[J].计算机应用研究,2025,42(6):1648-1655,8.

基金项目

中国博士后科学基金资助项目(2022M720569) （2022M720569）

浙江省自然科学基金资助项目(LQ21F020022) （LQ21F020022）

计算机应用研究

OA北大核心

ISSN：1001-3695

访问量4

下载量0

段落导航