首页|期刊导航|心理学报|多模态大语言模型动态社会互动情景下的情感能力测评

多模态大语言模型动态社会互动情景下的情感能力测评

周子森黄琪谭泽宏刘睿曹子亨母芳蔓樊亚春秦绍正

心理学报2025，Vol.57Issue(11)：1988-2000,中插22-中插32,24.

心理学报2025，Vol.57Issue(11)：1988-2000,中插22-中插32,24.DOI:10.3724/SP.J.1041.2025.1988

多模态大语言模型动态社会互动情景下的情感能力测评

Emotional capabilities evaluation of multimodal large language model in dynamic social interaction scenarios

周子森 ¹黄琪 ¹谭泽宏 ²刘睿 ³曹子亨 ⁴母芳蔓 ⁵樊亚春 ²秦绍正¹

作者信息

1. 北京师范大学认知神经科学与学习国家重点实验室
2. 北京师范大学人工智能学院,北京 100875
3. 内蒙古财经大学工商管理学院,呼和浩特 010070
4. 阿里巴巴集团,杭州 310020
5. 楚雄师范学院数学与计算机科学学院,云南楚雄 675000
折叠

摘要

Abstract

Multimodal Large Language Models(MLLMs)can process and integrate multimodal data,such as images and text,providing a powerful tool for understanding human psychology and behavior.Combining classic emotional behavior experimental paradigms,this study compares the emotion recognition and prediction abilities of human participants and two mainstream MLLMs in dynamic social interaction contexts,aiming to disentangle the distinct roles of visual features of conversational characters(images)and conversational content(text)in emotion recognition and prediction. The results indicate that the emotion recognition and prediction performance of MLLMs,based on character images and conversational content,exhibits moderate or lower correlations with human participants.Despite a notable gap,MLLMs have begun to demonstrate preliminary capabilities in emotion recognition and prediction similar to human participants in dyadic interactions.Using human performance as a benchmark,the study further compares MLLMs under different conditions:integrating both character images and conversational content,using only character images,or relying solely on conversational content.The results suggest that visual features of character interactions somewhat constrain MLLMs' basic emotion recognition but effectively facilitate the recognition of complex emotions,while having no significant impact on emotion prediction. Additionally,by comparing the emotion recognition and prediction performance of two mainstream MLLMs and different versions of GPT-4,the study finds that,rather than merely increasing the scale of training data,innovations in the underlying technical framework play a more crucial role in enhancing MLLMs'emotional capabilities in dynamic social interaction contexts.Overall,this study deepens the understanding of the interaction between human visual features and conversational content,fosters interdisciplinary integration between psychology and artificial intelligence,and provides valuable theoretical and practical insights for developing explainable affective computing models and general artificial intelligence.

关键词

多模态大语言模型/社会互动/情绪识别/情绪推理

Key words

multimodal large language model/social interaction/emotion recognition/emotion prediction

分类

社会科学

引用本文复制引用

周子森,黄琪,谭泽宏,刘睿,曹子亨,母芳蔓,樊亚春,秦绍正..多模态大语言模型动态社会互动情景下的情感能力测评[J].心理学报,2025,57(11):1988-2000,中插22-中插32,24.

基金项目

国家自然科学基金重点项目(32130045) （32130045）

组织间合作项目(32361163611). （32361163611）

心理学报

OA北大核心

ISSN：0439-755X

访问量0

下载量0

段落导航