安徽大学学报(自然科学版)2026,Vol.50Issue(2):25-33,9.DOI:10.3969/j.issn.1000-2162.2026.02.004
融合音频文本的多模态音乐情感联合学习识别
Multi-modal music emotion joint learning recognition method integrating audio text
摘要
Abstract
Aiming at the problem of multi-modal emotion recognition in music,a joint learning framework considering the characteristics of multi-modal data was proposed.The multi-modal emotion recognition was taken as the main task,and the multi-modal emotion category recognition was taken as the auxiliary task.Through the emotion category recognition task assisting emotion recognition task,the performance of music emotion and emotion category recognition was improved together.Firstly,the audio and text information in the main task were encoded respectively through the private layer network,and the internal emotional characteristics of single audio and text modal information were learned.Then,through the shared network layer,the emotion information in the main task and the emotion classification information in the auxiliary task were learned,and then combined with the single mode independent features in the main task.The single mode information on the main task was obtained.Finally,multi-modal feature information was captured by self-attention mechanism to obtain the final emotion and emotion category recognition.Experiments showed that the proposed method can improve the performance of emotion recognition through the framework of joint learning,and the performance of emotion category recognition can also be improved to some extent.关键词
音乐/多模态/情感识别/联合学习Key words
music/multi-modal/emotion recognition/joint learning分类
信息技术与安全科学引用本文复制引用
刘羚瑶..融合音频文本的多模态音乐情感联合学习识别[J].安徽大学学报(自然科学版),2026,50(2):25-33,9.基金项目
山西省自然科学基金面上项目(202403021221193) (202403021221193)
教育部人文社会科学研究规划基金资助项目(23YJAZH070) (23YJAZH070)
山西省艺术科学规划课题(22BA152) (22BA152)