心理学报2025,Vol.57Issue(11):1988-2000,中插22-中插32,24.DOI:10.3724/SP.J.1041.2025.1988
多模态大语言模型动态社会互动情景下的情感能力测评
Emotional capabilities evaluation of multimodal large language model in dynamic social interaction scenarios
摘要
Abstract
Multimodal Large Language Models(MLLMs)can process and integrate multimodal data,such as images and text,providing a powerful tool for understanding human psychology and behavior.Combining classic emotional behavior experimental paradigms,this study compares the emotion recognition and prediction abilities of human participants and two mainstream MLLMs in dynamic social interaction contexts,aiming to disentangle the distinct roles of visual features of conversational characters(images)and conversational content(text)in emotion recognition and prediction. The results indicate that the emotion recognition and prediction performance of MLLMs,based on character images and conversational content,exhibits moderate or lower correlations with human participants.Despite a notable gap,MLLMs have begun to demonstrate preliminary capabilities in emotion recognition and prediction similar to human participants in dyadic interactions.Using human performance as a benchmark,the study further compares MLLMs under different conditions:integrating both character images and conversational content,using only character images,or relying solely on conversational content.The results suggest that visual features of character interactions somewhat constrain MLLMs' basic emotion recognition but effectively facilitate the recognition of complex emotions,while having no significant impact on emotion prediction. Additionally,by comparing the emotion recognition and prediction performance of two mainstream MLLMs and different versions of GPT-4,the study finds that,rather than merely increasing the scale of training data,innovations in the underlying technical framework play a more crucial role in enhancing MLLMs'emotional capabilities in dynamic social interaction contexts.Overall,this study deepens the understanding of the interaction between human visual features and conversational content,fosters interdisciplinary integration between psychology and artificial intelligence,and provides valuable theoretical and practical insights for developing explainable affective computing models and general artificial intelligence.关键词
多模态大语言模型/社会互动/情绪识别/情绪推理Key words
multimodal large language model/social interaction/emotion recognition/emotion prediction分类
心理学引用本文复制引用
周子森,黄琪,谭泽宏,刘睿,曹子亨,母芳蔓,樊亚春,秦绍正..多模态大语言模型动态社会互动情景下的情感能力测评[J].心理学报,2025,57(11):1988-2000,中插22-中插32,24.基金项目
国家自然科学基金重点项目(32130045) (32130045)
组织间合作项目(32361163611). (32361163611)