| 注册
首页|期刊导航|计算机应用研究|面向视觉-语言模型的递进互提示学习

面向视觉-语言模型的递进互提示学习

陶俊杰 张卫锋 王玉霞 缪翌 徐领

计算机应用研究2025,Vol.42Issue(6):1648-1655,8.
计算机应用研究2025,Vol.42Issue(6):1648-1655,8.DOI:10.19734/j.issn.1001-3695.2024.10.0446

面向视觉-语言模型的递进互提示学习

ProgCoPL:progressive co-prompting learning for vision-language models

陶俊杰 1张卫锋 2王玉霞 3缪翌 1徐领1

作者信息

  • 1. 浙江理工大学计算机科学与技术学院(人工智能学院),杭州 310000
  • 2. 浙江理工大学计算机科学与技术学院(人工智能学院),杭州 310000||嘉兴大学信息科学与工程学院,浙江嘉兴 314000
  • 3. 嘉兴市计量检定测试院,浙江嘉兴 314000
  • 折叠

摘要

Abstract

The large-scale pre-trained vision-language model CLIP aligns images and texts in a shared semantic space,demon-strating robust generalization capabilities across diverse downstream tasks.However,existing prompt learning methods often in-dependently insert learnable prompt vectors into each layer of CLIP's visual and text encoders.This approach results in limi-ted cross-modal interaction,with independent prompts across layers failing to effectively guide the encoders in capturing task-relevant information.To address these issues,this paper proposed ProgCoPL.This method introduced text-guided prompt vec-tors into the visual encoder layers and vision-guided prompt vectors into the text encoder layers,thereby enhancing cross-modal interaction and alignment.Furthermore,ProgCoPL incorporated information transmission channels between prompt vectors across layers,enabling hierarchical and progressive integration of task-specific information.Experiments on 11 datasets show that ProgCoPL efficiently adapts CLIP to downstream tasks,significantly improving its cross-dataset generalization ability.ProgCoPL outperforms existing methods in multiple generalization tests,particularly achieving notable advancements in cross-dataset scenarios.

关键词

多模态/提示学习/视觉-语言模型/Transformer编码器

Key words

multimodal/prompt learning/vision-language model/Transformer encoder

分类

计算机与自动化

引用本文复制引用

陶俊杰,张卫锋,王玉霞,缪翌,徐领..面向视觉-语言模型的递进互提示学习[J].计算机应用研究,2025,42(6):1648-1655,8.

基金项目

中国博士后科学基金资助项目(2022M720569) (2022M720569)

浙江省自然科学基金资助项目(LQ21F020022) (LQ21F020022)

计算机应用研究

OA北大核心

1001-3695

访问量1
|
下载量0
段落导航相关论文