首页|期刊导航|计算机科学与探索|融合动态卷积和注意力机制的多层感知机语音情感识别

融合动态卷积和注意力机制的多层感知机语音情感识别

张雨萌张欣高谋赵虎林

计算机科学与探索2025，Vol.19Issue(4)：1065-1075,11.

计算机科学与探索2025，Vol.19Issue(4)：1065-1075,11.DOI:10.3778/j.issn.1673-9418.2406008

融合动态卷积和注意力机制的多层感知机语音情感识别

Incorporating Dynamic Convolution and Attention Mechanism in Multilayer Per-ceptron for Speech Emotion Recognition

张雨萌 ¹张欣 ²高谋 ³赵虎林³

作者信息

1. 对外经济贸易大学外国语学院,北京 100105
2. 中国人民解放军总医院研究生院统计学与流行病学教研室,北京 100853
3. 中国人民解放军总医院第一医学中心神经外科医学部,北京 100853
折叠

摘要

Abstract

Speech emotion recognition technology infers the speaker's emotions by analyzing the vocal signals,enhancing the naturalness and intelligence of human-computer interaction.However,existing models often overlook the semantic information of time and frequency,affecting the recognition accuracy.To address this problem,a multi-layer perceptron model that integrates dynamic convolution and attention mechanisms has been proposed,significantly improving the accu-racy of emotion recognition and the efficiency of information utilization.Firstly,the input speech signals are transformed into a Mel-spectrogram to capture detailed signal variations and more accurately reflect human perception of sound,lay-ing foundation for subsequent feature extraction.The Mel-spectrogram is then tokenized to reduce data complexity.Next,dynamic convolution and split attention mechanisms are employed to extract key temporal-frequency features efficiently.Dynamic convolution adapts to scale changes across different time and frequency domains,thereby enhancing the efficiency of capturing features.Meanwhile,the split attention mechanism enhances the ability of the model to focus on crucial infor-mation,effectively improving the feature expressive capability.By combining the advantages of dynamic convolution and split attention mechanisms,the proposed model can fully extract crucial acoustic features,thereby achieving more effi-cient and accurate emotion recognition.Experiments conducted on the RAVDESS,EmoDB,and CASIA speech emotion databases show that the recognition accuracy of the proposed model significantly surpasses existing technologies,reaching 86.11%,95.33%,and 82.92%.This verifies the effectiveness of the proposed model in complex emotion recognition tasks,as well as the efficacy of dynamic convolution and attention mechanisms.

关键词

语音情感识别/梅尔频谱图/多层感知机/动态卷积/注意力机制

Key words

speech emotion recognition/Mel-spectrogram/multi-layer perceptron/dynamic convolution/attention mech-anism

分类

信息技术与安全科学

引用本文复制引用

张雨萌,张欣,高谋,赵虎林..融合动态卷积和注意力机制的多层感知机语音情感识别[J].计算机科学与探索,2025,19(4):1065-1075,11.

基金项目

国家自然科学基金(82271397) （82271397）

国家自然科学基金青年基金(82001293).This work was supported by the National Natural Science Foundation of China(82271397),and the Youth Fund of the National Natural Science Foundation of China(82001293). （82001293）

计算机科学与探索

OA北大核心

ISSN：1673-9418

访问量7

下载量0

段落导航