桂林电子科技大学学报2025,Vol.45Issue(1):20-26,7.DOI:10.16725/j.1673-808X.202319
图像情感信息增强的视觉问答模型
Visual question answering model enhanced with image emotional information
摘要
Abstract
Visual question answering(VQA)refers to the multimedia understanding task where a computer is given an image and a natural language question related to the image content,and it is required to provide a correct answer.Early VQA models often over-looked the emotional information in images,resulting in insufficient performance when answering emotion-related questions.On the other hand,existing emotion-integrated VQA models do not make full use of key regions in images and keywords in text,leading to a lack of in-depth understanding of fine-grained questions and overall low accuracy in their answers.To fully incorporate image emotional information into VQA models and use this information to enhance the models'ability to answer questions,we propose an emotion-enhanced visual question answering model(IEVQA).This model builds on a large-scale pre-trained model framework and uses an emotion module to improve its capability in answering emotion-related questions.Experiments were conducted on a VQA benchmark dataset.The final results show that the IEVQA model outperforms other comparison methods in comprehensive metrics,and it validates the effectiveness of using emotional information to assist VQA models.关键词
视觉问答/自然语言/多媒体理解/情感/细粒度Key words
visual question answering/natural language/multimedia understanding/emotion/fine-grained分类
计算机与自动化引用本文复制引用
蔡锦,蔡国永..图像情感信息增强的视觉问答模型[J].桂林电子科技大学学报,2025,45(1):20-26,7.基金项目
国家自然科学基金(61763007) (61763007)
广西研究生教育创新计划(YCSW2022285) (YCSW2022285)
广西可信软件重点实验室基金(kx202060) (kx202060)