|国家科技期刊平台
首页|期刊导航|郑州大学学报(工学版)|基于CLIP和交叉注意力的多模态情感分析模型

基于CLIP和交叉注意力的多模态情感分析模型OACSTPCD

Multimodal Sentiment Analysis Model Based on CLIP and Cross-attention

中文摘要英文摘要

针对多模态情感分析中存在的标注数据量少、模态间融合不充分以及信息冗余等问题,提出了一种基于对比语言-图片训练(CLIP)和交叉注意力(CA)的多模态情感分析(MSA)模型 CLIP-CA-MSA.首先,该模型使用CLIP 预训练的 BERT模型、PIFT模型来提取视频特征向量与文本特征;其次,使用交叉注意力机制将图像特征向量和文本特征向量进行交互,以加强不同模态之间的信息传递;最后,利用不确定性损失特征融合后计算输出最终的情感分类结果.实验结果表明:该模型比其他多模态模型准确率提高 5 百分点至 14 百分点,F1 值提高 3 百分点至 12 百分点,验证了该模型的优越性,并使用消融实验验证该模型各模块的有效性.该模型能够有效地利用多模态数据的互补性和相关性,同时利用不确定性损失来提高模型的鲁棒性和泛化能力.

In response to the issues of limited annotated data,insufficient fusion between modalities,and informa-tion redundancy in multimodal sentiment analysis,a multimodal sentiment analysis model called CLIP-CA-MSA based on contrastive language-image pretraining(CLIP)and cross-attention mechanism was proposed in this study.This model employed models such as BERT which was pre-trained by CLIP,and PIFT to extract feature vectors from videos and textual content.Subsequently,a cross-attention mechanism was applied to facilitate interaction be-tween image feature vectors and text feature vectors,enhancing information exchange across different modalities.Finally,the uncertainty loss was utilized to compute the fused features,and the ultimate sentiment classification re-sults were generated from the outputs.The experimental results showed that the model could increase accuracyrate by 5 percentage points to 14 percentage points and the F1 value by 3 percentage point to 12 percentage point over other multimodal models,which verifieed the superiority of the model in this study.And uses of ablation experi-ments to verified the validity of each module of the model.This model could effectively utilize the complementarity and correlation of multimodal data,and utilize uncertainty loss to improve the robustness and generalization ability of the model.

陈燕;赖宇斌;肖澳;廖宇翔;陈宁江

广西大学 计算机与电子信息学院,广西 南宁 530000||广西大学 广西多媒体通信与网络技术重点实验室,广西 南宁 530000广西大学 计算机与电子信息学院,广西 南宁 530000

计算机与自动化

情感分析多模态学习交叉注意力CLIP 模型Transformer特征融合

sentiment analysismultimodal learningcross-attentionCLIP modelTransformerfeature fusion

《郑州大学学报(工学版)》 2024 (002)

42-50 / 9

广西壮族自治区科学研究与技术开发计划资助项目(桂科AA20302002-3);广西壮族自治区自然科学基金资助项目(2020GXNSFAA159090)

10.13705/j.issn.1671-6833.2024.02.003

评论