华东师范大学学报(自然科学版)Issue(6):46-52,7.DOI:10.3969/j.issn.1000-5641.2025.06.006
面向心智理论发展的视频问答研究
Research on video question answer for the development of theory of mind
摘要
Abstract
In recent years,with the continuous development of machine theory of mind(ToM),research has found that the development of machine ToM differs significantly from the triangular model of children's ToM development.Consequently,we propose a machine-oriented theory of mind triangular model.This model elucidates the relationships among various tools in the process of developing machine ToM.Additionally,we introduce an evaluation dataset suitable for the dynamic assessment of machine ToM.Finally,this paper designs a VideoQA(video question answer)model,named FOMemNet(fact and observer memory network),specifically tailored for cognitive reasoning—a model addressing belief,desire,and intention reasoning.Considering that models in cognitive reasoning tasks need to infer from the observer's perspective,we incorporate the FOEM(vision fact and observer perception encoder module)module in FOMemNet for the fusion of multimodal features,thereby obtaining visual factual features and observer features.Subsequently,the model utilizes the FOF(fact and observer fusion)module and two memory modules to integrate features from both perspectives for obtaining a global representation.FOMemNet results in a 2.27%improvement of BDIQA.Our experiments demonstrate the effectiveness of the concept of fact and observer perception in enhancing cognitive reasoning abilities in VideoQA.关键词
人工智能/机器认知评测/多模态Key words
artificial intelligence/machine cognition evaluation/multimodality分类
信息技术与安全科学引用本文复制引用
毛媛媛,林欣,倪琴,邓赐平,马毅鸣..面向心智理论发展的视频问答研究[J].华东师范大学学报(自然科学版),2025,(6):46-52,7.基金项目
国家自然科学基金(2021ZD0111000,2021ZD0111004) (2021ZD0111000,2021ZD0111004)
上海市科委项目(21511100101,22511105901,22DZ2229004) (21511100101,22511105901,22DZ2229004)