首页|期刊导航|计算机科学与探索|多模态数据融合研究综述

多模态数据融合研究综述

张虎成李雷孝刘东江

计算机科学与探索2024，Vol.18Issue(10)：2501-2520,20.

计算机科学与探索2024，Vol.18Issue(10)：2501-2520,20.DOI:10.3778/j.issn.1673-9418.2403083

多模态数据融合研究综述

Survey of Multimodal Data Fusion Research

张虎成 ¹李雷孝 ²刘东江²

作者信息

1. 内蒙古工业大学数据科学与应用学院,呼和浩特 010080
2. 内蒙古工业大学数据科学与应用学院,呼和浩特 010080||内蒙古自治区基于大数据的软件服务工程技术研究中心,呼和浩特 010080
折叠

摘要

Abstract

Although the powerful learning ability of deep learning has achieved excellent results in the field of single-modal applications,it has been found that the feature representation of a single modality is difficult to fully contain the complete information of a phenomenon.In order to break through the obstacles of feature representation on a single modality and make greater use of the value contained in multiple modalities,scholars have begun to propose the use of multimodal fusion to improve model learning performance.Multimodal fusion technology is to make the machine use the correlation and complementarity between modalities to fuse into a better feature representation in text,speech,image and video,which provides a basis for model training.At present,the research of multimodal fusion is still in the early stage of development.This paper starts from the hot research field of multimodal fusion in recent years,and expounds the multimodal fusion method and the multimodal alignment technology in the fusion process.Firstly,the application,advantages and disadvantages of joint fusion method,cooperative fusion method,encoder fusion method and split fusion method in multimodal fusion are analyzed.The problem of multimodal alignment in the fusion process is expounded,including explicit alignment and implicit alignment,as well as the application,advantages and disadvantages.Secondly,it expounds the application of popular datasets in multimodal fusion in different fields in recent years.Finally,the challenges and research prospects of multimodal fusion are expounded to further promote the development and application of multimodal fusion.

关键词

深度学习/多模态融合/模态对齐/多模态应用

Key words

deep learning/multimodal fusion/modal alignment/multimodal applications

分类

信息技术与安全科学

引用本文复制引用

张虎成,李雷孝,刘东江..多模态数据融合研究综述[J].计算机科学与探索,2024,18(10):2501-2520,20.

基金项目

国家自然科学基金(62362055) （62362055）

内蒙古自治区重点研发与成果转化计划项目(2022YFSJ0013,2023YFHH0052) （2022YFSJ0013,2023YFHH0052）

内蒙古自治区高等学校青年科技英才支持计划项目(NJYT22084,NJYT24035) （NJYT22084,NJYT24035）

内蒙古自然科学基金(2023MS06008) （2023MS06008）

内蒙古自治区直属高校科研项目(JY20220061,JY20222077,JY20230119,JY20230019) （JY20220061,JY20222077,JY20230119,JY20230019）

鄂尔多斯市重点研发计划项目(YF20232328).This work was supported by the National Natural Science Foundation of China(62362055),the Key Research and Development and Achievement Transformation Program of Inner Mongolia Autonomous Region(2022YFSJ0013,2023YFHH0052),the Support Program for Young Scientific and Technological Talents in Higher Education Institutions of Inner Mongolia Autonomous Region(NJYT22084,NJYT24035),the Natural Science Foundation of Inner Mongolia(2023MS06008),the Research Projects of Universities Directly under Inner Mongolia Autonomous Region(JY20220061,JY20222077,JY20230119,JY20230019),and the Key Research and Development Program of Ordos(YF20232328). （YF20232328）

计算机科学与探索

OA北大核心CSTPCD

ISSN：1673-9418

访问量31

下载量0

段落导航