通信与信息技术Issue(z1):45-48,4.
基于多粒度层次知识的视觉问答方法
Multi-granularity hierarchical knowledge for visual question answering
GAO Yalong 1ZHU Jinlong1
作者信息
- 1. China Unicom(Sichuan)Industrial Internet Co.,Ltd.,Chengdu 610000,China
- 折叠
摘要
Abstract
Visual Question Answering(VQA)requires the model to combine image content with natural language questions to generate accu-rate answers.This not only demands strong visual perception capabilities but also requires the model to have the ability to reason by integrating knowledge.Existing models often utilize knowledge at a shallow level,neglecting the hierarchical structure and granularity differences of entities,or relying on fixed-pattern feature fusion,making it difficult to handle dynamically changing knowledge relationships.To address these issues,we propose a VQA method based on multi-granularity hierarchical knowledge.By constructing a hierarchical knowledge framework that includes a hierarchical scene graph with intra-layer associations and inter-layer mappings,the method explicitly models the multi-granularity structure of concepts to achieve multi-granularity parsing of entity concepts.Two attention mechanisms are designed to handle knowledge propagation across different granularity layers and knowledge focusing within layers.Neural network modules are employed to perform reasoning and cross-layer propagation of knowledge at different granularities.Experimental results on two datasets demonstrate the superior performance of this method.关键词
视觉问答/多模态推理/多粒度知识Key words
Visual Question Answering/Multimodal reasoning/Multi-Granularity knowledge分类
信息技术与安全科学引用本文复制引用
GAO Yalong,ZHU Jinlong..基于多粒度层次知识的视觉问答方法[J].通信与信息技术,2025,(z1):45-48,4.