图表问答研究综述OA
Review of Research on Chart Question Answering
[目的]本文旨在全面综述图表问答(CQA)技术的研究进展,分析现有模型和方法,并探讨未来发展方向.[方法]首先将CQA模型分为两大类:基于深度学习和基于多模态大模型.针对基于深度学习的方法,本文进一步细分为端到端模型和两阶段模型.随后,深入分析了基于深度学习的CQA任务的三个核心流程,并对各个流程现有的处理方法进行了详细的分类和深入的分析.本文还探讨了基于多模态大模型的CQA模型,分析了其优势、局限性以及未来发展方向.[结果]本文全面总结了CQA技术的研究现状,并对现有模型和方法进行了深入分析.本文发现,基于深度学习的CQA模型在处理标准图表类型和简单任务时表现优异,但在面对复杂、非标准化图表或需要深度推理的任务时仍显不足.而基于多模态大模型的CQA模型则展现出巨大的潜力,但模型性能的提升往往伴随着模型规模和计算复杂度的增加.未来研究应聚焦于开发更轻量化的问答模型,并提升模型的可解释性.
[Objective]The purpose of this paper is to comprehensively review the research progress of Chart Question Answering(CQA)technology,analyze existing models and methods,and ex-plore future development directions.[Methods]Firstly,CQA models are divided into two cate-gories:deep learning-based and multi-modal large models.Deep learning-based methods are further subdivided into end-to-end models and two-stage models in this paper.Subsequently,the three core processes taken by the deep learning-based CQA are deeply analyzed,and a de-tailed classification along with an in-depth analysis of the existing processing methods for each process is provided.CQA models based on multi-modal large models are also explored in this paper,with their advantages,limitations,and future development directions being analyzed.[Results]The current research status of CQA technology is comprehensively summarized,and an in-depth analysis of existing models and methods is conducted.It is found that deep learning-based CQA mod-els perform well in handling standard chart types and simple tasks,but fall short when facing complex,non-stan-dardized charts or tasks requiring deep reasoning.In contrast,CQA models based on multi-modal large models show great potential,but the improvement in model performance often comes with an increase in model size and computational complexity.Future research should focus on developing more lightweight question answering mod-els and enhancing model interpretability.
马秋平;张琪;赵晓凡
中国人民公安大学,信息网络安全学院,北京 100038中国人民公安大学,信息网络安全学院,北京 100038中国人民公安大学,信息网络安全学院,北京 100038
图表问答视觉问答深度学习多模态大语言模型
chart question answeringvisual question answeringdeep learningmulti-modal large language models
《数据与计算发展前沿》 2025 (1)
19-37,19
中央高校基本科研业务费项目(2024JKF18)
评论