首页|期刊导航|计算机技术与发展|视觉-语言模型研究综述

视觉-语言模型研究综述

马翌硕张光南刘亚婷闫迪陈冬刘星愿郭帅

计算机技术与发展2026，Vol.36Issue(3)：1-10,10.

计算机技术与发展2026，Vol.36Issue(3)：1-10,10.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0256

视觉-语言模型研究综述

A Survey of Visual-language Models

马翌硕 ¹张光南 ¹刘亚婷 ¹闫迪 ¹陈冬 ¹刘星愿 ¹郭帅¹

作者信息

1. 宝鸡文理学院计算机学院,陕西宝鸡 721016
折叠

摘要

Abstract

In recent years,with the rapid development of multimodal learning,Visual-Language Models(VLMs)have demonstrated significant performance advantages in cross-modal tasks such as image captioning and visual question answering.By combining visual and linguistic information and leveraging large-scale image-text pairs available from the internet for pretraining,VLMs have become a research hotspot in the field.However,systematic reviews of VLMs,especially those including performance comparisons,analysis,and comprehensive reviews of end-to-end training processes,remain scarce.Therefore,we provide a comprehensive overview of the latest advancements in VLMs as of 2025,covering:classification and discussion of original text and image feature processing methods;classification and review of mainstream modal interaction strategies;review and discussion of classic and cutting-edge model architectures;a systematic summary of popular VLMs;benchmarking and discussion of current transfer learning methods in terms of per-formance,and domain generalization.The three future research directions are proposed.

关键词

视觉语言模型/图像文本预训练/视觉语言学习/多模态/迁移学习

Key words

visual-language models/image-text pretraining/visual-language learning/multimodal/transfer learning

分类

信息技术与安全科学

引用本文复制引用

马翌硕,张光南,刘亚婷,闫迪,陈冬,刘星愿,郭帅..视觉-语言模型研究综述[J].计算机技术与发展,2026,36(3):1-10,10.

基金项目

陕西省重点研发计划项目(2024GX-YBXM-104) （2024GX-YBXM-104）

计算机技术与发展

ISSN：1673-629X

访问量30

下载量0

段落导航