自动化与信息工程2024,Vol.45Issue(4):36-41,49,7.DOI:10.3969/j.issn.1674-2605.2024.04.006
基于多尺度卷积和多头自注意力的语音情感识别模型
Speech Emotion Recognition Model Based on Multi-scale Convolution and Multi-head Self-attention
摘要
Abstract
A speech emotion recognition model based on multi-scale convolution and multi head self attention(MCNN-MHA)is proposed to address the problem of traditional convolutional neural networks being unable to fully capture temporal and frequency domain details in speech emotion recognition.Firstly,a multi-scale convolutional neural network is used to convolve the input at different scales,obtaining features in different time and frequency domains;Then,a multi head self attention mechanism is introduced to automatically learn relevant and important features in speech signals,and to focus on the subspaces of different features to enhance the perception ability of important features;Utilize the frequency domain mask and time domain mask in SpecAugment to enhance data samples and improve the generalization and robustness of the model.The experimental results showed that the MCNN-MHA model achieved an accuracy of 90.35%on the RAVDESS dataset.关键词
语音情感识别/多尺度卷积神经网络/多头自注意力机制/SpecAugmentKey words
speech emotion recognition/multi-scale convolution neural network/multi-head self-attention mechanism/SpecAugment分类
信息技术与安全科学引用本文复制引用
钟善机,张学习,陈楚嘉,高学秋,陶杰..基于多尺度卷积和多头自注意力的语音情感识别模型[J].自动化与信息工程,2024,45(4):36-41,49,7.基金项目
国家自然科学基金项目(62276069) (62276069)