计算机应用与软件2025,Vol.42Issue(5):108-115,129,9.DOI:10.3969/j.issn.1000-386x.2025.05.016
基于变分蒸馏的模态联合表示学习
MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATION
摘要
Abstract
In recent years,the research and development of deep learning for multi-mode interaction has attracted extensive attention,among which multi-mode pretraining model is indispensable.However,experiments show that most of these large models are not suitable for single-mode scenarios,and require a large number of multi-mode aligned corpora training which is difficult to obtain,and the number of parameters is too large to deploy.Therefore,this paper proposes a lightweight modal co-encoder MJBERT,which does not need to align multi-mode corpora and focuses on single-mode scenarios.In order to train MJBERT,MJ-KD was designed.The pre-training model Bertlarge and ResNet152 were used as teacher models,and their knowledge was transferred to MJBERT by MJ-KD.Experimental results show that the performance of the proposed MJBERT is equal to or better than that of the benchmark model on multiple tasks in image and text single-modality scenarios.关键词
知识蒸馏/多模态/变分互信息Key words
Knowledge distillation/Multi-mode/Mutual information分类
信息技术与安全科学引用本文复制引用
张亚伟,王晶晶,李嘉贤,周萌南..基于变分蒸馏的模态联合表示学习[J].计算机应用与软件,2025,42(5):108-115,129,9.基金项目
国家自然科学基金项目(62006166,62076175,62076176) (62006166,62076175,62076176)
中国博士后科学基金项目(2019 M661930) (2019 M661930)
江苏高校优势学科建设工程自主项目. ()