| 注册
首页|期刊导航|自动化学报|提示学习在计算机视觉中的分类、应用及展望

提示学习在计算机视觉中的分类、应用及展望

刘袁缘 刘树阳 刘云娇 袁雨晨 唐厂 罗威

自动化学报2025,Vol.51Issue(5):1021-1040,20.
自动化学报2025,Vol.51Issue(5):1021-1040,20.DOI:10.16383/j.aas.c240177

提示学习在计算机视觉中的分类、应用及展望

The Classification,Applications,and Prospects of Prompt Learning in Computer Vision

刘袁缘 1刘树阳 1刘云娇 1袁雨晨 1唐厂 1罗威2

作者信息

  • 1. 中国地质大学(武汉)计算机学院 武汉 430074
  • 2. 中国舰船研究设计中心 武汉 430064
  • 折叠

摘要

Abstract

With the rapid development of computer vision(CV),the growing demand for improving the perform-ance and generalization of visual tasks has led to a further increase in model complexity and the need for various re-sources.Prompt learning(PL),as a method to effectively enhance model performance and generalization,reuse pre-trained models,and reduce computational costs,has gained extensive attention and research in a series of down-stream visual tasks.However,existing PL surveys lack comprehensive classification and discussion of PL methods,as well as in-depth analysis of existing experimental results to evaluate the strengths and weaknesses of current methods.Therefore,this paper provides a comprehensive overview of the classification,application,and perform-ance of PL in the field of CV.Firstly,the research background and definition of PL are introduced,followed by a brief review of recent PL progress in CV.Secondly,PL methods in CV are categorized into text prompt,visual prompt,and vision-language joint prompt,with each category elaborated in detail and its strengths and weaknesses discussed.Next,recent advances of PL in ten common downstream visual tasks are reviewed.Additionally,experi-mental results from three CV applications are provided,summarized,and analyzed to comprehensively discuss the performance of different PL methods in CV.Finally,based on the above discussions,the challenges and opportunit-ies faced by PL in CV are analyzed,offering forward-looking insights to further advance the development of PL in the CV domain.

关键词

计算机视觉/提示学习/视觉-语言大模型/预训练模型

Key words

Computer vision/prompt learning/vision-language large model/pre-trained model

引用本文复制引用

刘袁缘,刘树阳,刘云娇,袁雨晨,唐厂,罗威..提示学习在计算机视觉中的分类、应用及展望[J].自动化学报,2025,51(5):1021-1040,20.

基金项目

国家自然科学基金(62076227,U2341228),湖北省自然科学基金(2023AFB572),湖北省智能地理信息处理重点实验室(KLIGIP-2022-B10)资助Supported by National Natural Science Foundation of China(62076227,U2341228),Natural Science Foundation of Hubei Province(2023AFB572),and Hubei Key Laboratory of Intelli-gent Geo-information Processing(KLIGIP-2022-B10) (62076227,U2341228)

自动化学报

OA北大核心

0254-4156

访问量0
|
下载量0
段落导航相关论文