| 注册
首页|期刊导航|计算机工程|文本视觉问答综述

文本视觉问答综述

朱贵德 黄海

计算机工程2024,Vol.50Issue(2):1-14,14.
计算机工程2024,Vol.50Issue(2):1-14,14.DOI:10.19678/j.issn.1000-3428.0067514

文本视觉问答综述

Survey of Text-based Visual Question Answering

朱贵德 1黄海1

作者信息

  • 1. 浙江理工大学计算机科学与技术学院(人工智能学院),浙江 杭州 310018
  • 折叠

摘要

Abstract

Traditional Visual Question Answering(VQA)only focuses on the visual object information in the image,ignoring the text information in the image.In addition to visual information,Text-based Visual Question Answering(TextVQA)also focuses on the text information in the image,which can answer questions more accurately and efficiently.In recent years,TextVQA has become a research focal point in the field of multimodality,and it has important application prospects in the field of scenes containing text information,such as automatic driving and scene understanding.This paper describes the concept of TextVQA and the existing problems and challenges,and makes a systematic analysis of TextVQA tasks from the aspects of methods,datasets,and future research directions.This study focuses on the analysis of the existing research methods of TextVQA,and summarizes them into three stages,namely,feature extraction,feature fusion,and answer prediction.According to the different methods used in the fusion stage,the TextVQA methods are described from three aspects:simple attention,Transformer-based,and pre-training methods.The advantages and disadvantages of different methods are summarized,and the performance of existing methods in public datasets is analyzed and compared.Four common public datasets are introduced,and their characteristics and evaluation metrics are analyzed.Finally,this paper discusses the problems and challenges facing the TextVQA task,and discusses the future research directions.

关键词

文本视觉问答/文本信息/自然语言处理/计算机视觉/多模态融合

Key words

Text-based Visual Question Answering(TextVQA)/text information/natural language processing/computer vision/multimodal fusion

分类

信息技术与安全科学

引用本文复制引用

朱贵德,黄海..文本视觉问答综述[J].计算机工程,2024,50(2):1-14,14.

基金项目

国家自然科学基金面上项目(62272416). (62272416)

计算机工程

OA北大核心CSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文