| 注册
首页|期刊导航|计算机技术与发展|视觉-语言模型研究综述

视觉-语言模型研究综述

马翌硕 张光南 刘亚婷 闫迪 陈冬 刘星愿 郭帅

计算机技术与发展2026,Vol.36Issue(3):1-10,10.
计算机技术与发展2026,Vol.36Issue(3):1-10,10.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0256

视觉-语言模型研究综述

A Survey of Visual-language Models

马翌硕 1张光南 1刘亚婷 1闫迪 1陈冬 1刘星愿 1郭帅1

作者信息

  • 1. 宝鸡文理学院 计算机学院,陕西 宝鸡 721016
  • 折叠

摘要

Abstract

In recent years,with the rapid development of multimodal learning,Visual-Language Models(VLMs)have demonstrated significant performance advantages in cross-modal tasks such as image captioning and visual question answering.By combining visual and linguistic information and leveraging large-scale image-text pairs available from the internet for pretraining,VLMs have become a research hotspot in the field.However,systematic reviews of VLMs,especially those including performance comparisons,analysis,and comprehensive reviews of end-to-end training processes,remain scarce.Therefore,we provide a comprehensive overview of the latest advancements in VLMs as of 2025,covering:classification and discussion of original text and image feature processing methods;classification and review of mainstream modal interaction strategies;review and discussion of classic and cutting-edge model architectures;a systematic summary of popular VLMs;benchmarking and discussion of current transfer learning methods in terms of per-formance,and domain generalization.The three future research directions are proposed.

关键词

视觉语言模型/图像文本预训练/视觉语言学习/多模态/迁移学习

Key words

visual-language models/image-text pretraining/visual-language learning/multimodal/transfer learning

分类

信息技术与安全科学

引用本文复制引用

马翌硕,张光南,刘亚婷,闫迪,陈冬,刘星愿,郭帅..视觉-语言模型研究综述[J].计算机技术与发展,2026,36(3):1-10,10.

基金项目

陕西省重点研发计划项目(2024GX-YBXM-104) (2024GX-YBXM-104)

计算机技术与发展

1673-629X

访问量0
|
下载量0
段落导航相关论文