计算机应用与软件2025,Vol.42Issue(10):191-197,238,8.DOI:10.3969/j.issn.1000-386x.2025.10.026
基于图注意力网络的视觉常识推理方法
A GRAPH ATTENTION NETWORK FOR VISUAL COMMONSENSE REASONING
摘要
Abstract
Visual commonsense reasoning(VCR)is a challenging multimodal task proposed in recent years.In order to reason the semantic relationship in images and improve the performance of the VCR task,a graph attention network for visual commonsense reasoning is proposed.The method encoded the visual objects for various images as visual nodes in the image and used the graph attention network to model the features of visual nodes and adjacent nodes to obtain the internal associations between the objects.In addition,the method effectively captured the dynamic interaction between visual objects and further improved the understanding of image semantics.Experiments on the VCR dataset show that the performance of the method on the three sub-tasks of VCR is improved.关键词
视觉常识推理/多模态/图注意力网络/视觉关系Key words
Visual commonsense reasoning/Multimodal/Graph attention network/Cognitive reasoning分类
计算机与自动化引用本文复制引用
张文琪,高永超,钱恒,吕红丽..基于图注意力网络的视觉常识推理方法[J].计算机应用与软件,2025,42(10):191-197,238,8.基金项目
国家重点研发计划项目(2021YFF0601603). (2021YFF0601603)