计算机应用研究2025,Vol.42Issue(9):2590-2598,9.DOI:10.19734/j.issn.1001-3695.2025.02.0043
基于模态敏感注意力机制的多模态对话模型及应用
Multimodal dialogue model and applications based on modality-sensitive attention mechanism
摘要
Abstract
The multimodal dialogue system adopts methods such as Transformer,cross-attention mechanism and pre-trained models to fuse text,speech and video modalities of different granularities and extracts cross-modal features.However,the existing research ignores the sensitive differences of different modal features on classification tasks,resulting in excessive fusion and information redundancy.Regarding the influence of sequential features of multimodal fusion on classification results,this paper proposed the multimodal dialogue model MDM-MSAM(multimodal dialogue model based on modality sensitive attention mechanism).The model was divided into three parts:master-slave mode screening,dual-modal cross-modal fusion,and tri-modal cross-modal fusion.By determining the master-slave modalities and extracting cross-dual-modal features,the model re-fused them with the tri-modal fusion features,then formed the modality-sensitive hierarchical cross-multimodal features.The classification accuracy on MintRec and CMU-MOSI datasets increase by 3.15%and 3.5%respectively compared with the currently best-performing model.The deployment and application of the MDM-MSAM in flow engine-based multi-round dia-logue system achieve good application results.关键词
多模态对话系统/跨模态特征/敏感差异性/模态敏感注意力机制/主从模态Key words
multimodal dialogue system/cross-modal features/sensitive differences/modality-sensitive attention mecha-nism/master-slave modality分类
信息技术与安全科学引用本文复制引用
杜维,朱晓瑛,许方敏,郑建生,朱福喜,龚鸣敏,李紫玉..基于模态敏感注意力机制的多模态对话模型及应用[J].计算机应用研究,2025,42(9):2590-2598,9.基金项目
国家自然科学基金资助项目(42374013) (42374013)
北京市自然科学基金资助项目(L234080) (L234080)
武汉学院科研基金年度计划资助项目(JJA202304) (JJA202304)
中国高校产学研创新基金—腾讯科技创新教育专项资助项目(2022TX007) (2022TX007)