计算机应用研究2024,Vol.41Issue(1):10-20,11.DOI:10.19734/j.issn.1001-3695.2023.05.0181
可解释的视觉问答研究进展
Research advances in explainable visual question answering
摘要
Abstract
In the context of visual question answering(VQA)tasks,"explainability"refers to the various ways in which re-searchers can explain why a model works in a given task.The lack of explainability of some existing VQA models has led to a lack of assurance that the models can be used safely in real-life applications,especially in fields such as autonomous driving and healthcare.This would raise ethical and moral issues that hinder their implementation in industry.This paper introduced various implementations for enhancing explainability in VQA tasks and categorized them into four main categories:image inter-pretation,text interpretation,multi-modal interpretation,modular interpretation,and graph interpretation.This paper dis-cussed the characteristics of each approach,and further presented the subdivisions for some of them.Furthermore,it presented several VQA datasets that aimed to enhance explainability.These datasets primarily focused on incorporating external know-ledge bases and annotating image information to improve explainability.In summary,this paper provided an overview of exis-ting commonly used interpretable methods for VQA tasks and proposed future research directions based on the identified short-comings of the current approaches.关键词
视觉问答/视觉推理/可解释性/人工智能/自然语言处理/计算机视觉Key words
visual question answering/visual reasoning/explainability/artificial intelligence/natural language processing/computer vision分类
信息技术与安全科学引用本文复制引用
张一飞,孟春运,蒋洲,栾力,Ernest Domanaanmwi Ganaa..可解释的视觉问答研究进展[J].计算机应用研究,2024,41(1):10-20,11.基金项目
国家社科基金重点项目(16AJL008) (16AJL008)
江苏省社科基金青年项目(22EYC001) (22EYC001)
江苏高校哲学社会科学研究一般项目(2019SJA1927) (2019SJA1927)