山西大学学报(自然科学版)2026,Vol.49Issue(2):263-271,9.DOI:10.13451/j.sxu.ns.2024104
基于图文理解增强的教科书视觉问答方法
Enhancing Image and Text Comprehension for Textbook Visual Question Answering
摘要
Abstract
Textbook Visual Question Answering is a multi-modal task in the field of smart education that requires a deep understand-ing of textbook images,text,and questions to infer the correct answers.However,existing generic Visual Question Answering meth-ods perform poorly in this task.The main reasons are as follows:Firstly,these methods can only simply recognize object attributes,lack disciplinary information,and are susceptible to interference from redundant information unrelated to the questions.Secondly,they struggle to capture key information in the texts.To solve these problems,a textbook visual question-answering method based on image description enhancement is proposed,which mainly includes three modules:(1)Text encoding and understanding:Utilizing large language models to extract keywords from questions and retrieve relevant statements in the text related to the question key-words to enhance text understanding and eliminate interference from redundant informations.(2)Image encoding and description:Employing a question-image attention mechanism in image descriptions to generate fine-grained image description statements con-strained by questions based on question keywords,thereby enhancing image understanding ability.(3)Answer prediction:using a pre-trained visual-language model to fuse text information with visual information to improve the model's reasoning ability.Experi-mental results on relevant datasets demonstrate that the proposed method effectively improves the understanding of textbook infor-mation,thereby enhancing answer prediction accuracy.The accuracy of the test set and the verification set was improved by 1.82%and 1.72%,respectively.关键词
视觉问答/智慧教育/图像描述/图文理解增强Key words
visual question answering/intelligent education/image caption/image-text comprehension enhancement分类
信息技术与安全科学引用本文复制引用
胡景畅,强鹏鹏,谭红叶,王宏宇,慕永利..基于图文理解增强的教科书视觉问答方法[J].山西大学学报(自然科学版),2026,49(2):263-271,9.基金项目
国家自然科学基金(62076155) (62076155)
太原市小店区-山西大学产学研合作项目"短答案自动评分技术在综合评价系统中的推广与应用"(202301S06) (202301S06)