自动化学报2025,Vol.51Issue(5):1021-1040,20.DOI:10.16383/j.aas.c240177
提示学习在计算机视觉中的分类、应用及展望
The Classification,Applications,and Prospects of Prompt Learning in Computer Vision
摘要
Abstract
With the rapid development of computer vision(CV),the growing demand for improving the perform-ance and generalization of visual tasks has led to a further increase in model complexity and the need for various re-sources.Prompt learning(PL),as a method to effectively enhance model performance and generalization,reuse pre-trained models,and reduce computational costs,has gained extensive attention and research in a series of down-stream visual tasks.However,existing PL surveys lack comprehensive classification and discussion of PL methods,as well as in-depth analysis of existing experimental results to evaluate the strengths and weaknesses of current methods.Therefore,this paper provides a comprehensive overview of the classification,application,and perform-ance of PL in the field of CV.Firstly,the research background and definition of PL are introduced,followed by a brief review of recent PL progress in CV.Secondly,PL methods in CV are categorized into text prompt,visual prompt,and vision-language joint prompt,with each category elaborated in detail and its strengths and weaknesses discussed.Next,recent advances of PL in ten common downstream visual tasks are reviewed.Additionally,experi-mental results from three CV applications are provided,summarized,and analyzed to comprehensively discuss the performance of different PL methods in CV.Finally,based on the above discussions,the challenges and opportunit-ies faced by PL in CV are analyzed,offering forward-looking insights to further advance the development of PL in the CV domain.关键词
计算机视觉/提示学习/视觉-语言大模型/预训练模型Key words
Computer vision/prompt learning/vision-language large model/pre-trained model引用本文复制引用
刘袁缘,刘树阳,刘云娇,袁雨晨,唐厂,罗威..提示学习在计算机视觉中的分类、应用及展望[J].自动化学报,2025,51(5):1021-1040,20.基金项目
国家自然科学基金(62076227,U2341228),湖北省自然科学基金(2023AFB572),湖北省智能地理信息处理重点实验室(KLIGIP-2022-B10)资助Supported by National Natural Science Foundation of China(62076227,U2341228),Natural Science Foundation of Hubei Province(2023AFB572),and Hubei Key Laboratory of Intelli-gent Geo-information Processing(KLIGIP-2022-B10) (62076227,U2341228)