计算机与现代化Issue(2):32-38,7.DOI:10.3969/j.issn.1006-2475.2026.02.004
用于知识视觉问答的问题增强知识检索网络
Question-augmented Knowledge Retrieval for Knowledge-based Visual Question Answering
摘要
Abstract
Knowledge-Based Visual Question Answering(KB-VQA)requires answering questions with external knowledge in addition to the content of images.Nowadays,many works transform everything into the textual space for retrieving knowledge by textual space paradigm retriever,but there are two major limitations in textual space paradigm retriever for KB-VQA:1)The query obtained via image-to-text transformation can be inaccurate and redundant due to the absence of the question;2)Rel-evance between queries and supporting knowledge are computed with their semantic similarity,which can be insufficient to ques-tion answering.To this end,this paper proposes a Question-augmented Knowledge Retrieval Network(QKRN)for knowledge-based visual question answering,which consists of Question-augmented Query Construction(QQC)and Reverse Inference-based Re-ranking Retriever(RIR)modules.More specifically,the QQC module utilizes the cross-attention mechanism to local-ize question-related visual regions and construct question-augmented queries.Furthermore,the RIR module re-ranks the re-trieved knowledge by computing the likelihood of question generation conditioned on the knowledge.Extensive experiments con-ducted on OK-VQA and FVQA datasets verify the outperformance of the proposed QKRN.关键词
人工智能/神经网络模型/深度学习/基于外部知识的视觉问答Key words
artificial intelligence/neural network models/deep learning/knowledge-based VQA分类
信息技术与安全科学引用本文复制引用
赵永超,杨振国..用于知识视觉问答的问题增强知识检索网络[J].计算机与现代化,2026,(2):32-38,7.基金项目
广东省自然科学基金面上项目(2024A1515010237) (2024A1515010237)