郑州大学学报(工学版)2024,Vol.45Issue(2):42-50,9.DOI:10.13705/j.issn.1671-6833.2024.02.003
基于CLIP和交叉注意力的多模态情感分析模型
Multimodal Sentiment Analysis Model Based on CLIP and Cross-attention
摘要
Abstract
In response to the issues of limited annotated data,insufficient fusion between modalities,and informa-tion redundancy in multimodal sentiment analysis,a multimodal sentiment analysis model called CLIP-CA-MSA based on contrastive language-image pretraining(CLIP)and cross-attention mechanism was proposed in this study.This model employed models such as BERT which was pre-trained by CLIP,and PIFT to extract feature vectors from videos and textual content.Subsequently,a cross-attention mechanism was applied to facilitate interaction be-tween image feature vectors and text feature vectors,enhancing information exchange across different modalities.Finally,the uncertainty loss was utilized to compute the fused features,and the ultimate sentiment classification re-sults were generated from the outputs.The experimental results showed that the model could increase accuracyrate by 5 percentage points to 14 percentage points and the F1 value by 3 percentage point to 12 percentage point over other multimodal models,which verifieed the superiority of the model in this study.And uses of ablation experi-ments to verified the validity of each module of the model.This model could effectively utilize the complementarity and correlation of multimodal data,and utilize uncertainty loss to improve the robustness and generalization ability of the model.关键词
情感分析/多模态学习/交叉注意力/CLIP 模型/Transformer/特征融合Key words
sentiment analysis/multimodal learning/cross-attention/CLIP model/Transformer/feature fusion分类
信息技术与安全科学引用本文复制引用
陈燕,赖宇斌,肖澳,廖宇翔,陈宁江..基于CLIP和交叉注意力的多模态情感分析模型[J].郑州大学学报(工学版),2024,45(2):42-50,9.基金项目
广西壮族自治区科学研究与技术开发计划资助项目(桂科AA20302002-3) (桂科AA20302002-3)
广西壮族自治区自然科学基金资助项目(2020GXNSFAA159090) (2020GXNSFAA159090)