摘要
Abstract
MFCC and its first-order differential features represent the static and dynamic information of speech,often used as emotional features in SER.In the traditional MFCC feature extraction process,balancing the speech signal-to-noise ratio through manual parameter tuning can easily lead to overcompensation.This article proposes two improvement methods to obtain EMFCC and AMFCC features,respectively.In order to achieve the best classification accuracy,an MLA model was constructed based on pooling layer,LSTM,and attention mechanism,which can effectively capture emotional information in features.A mixed feature consisting of MFCC and its first-order differential features,as well as two improved MFCC features,achieved an unweighted accuracy of 81.79%on the CASIA corpus.The results of the ablation experiment indicate that compared with other advanced recognition methods in the SER field,the improved MFCC feature has better performance advantages.关键词
语音情感识别/梅尔频率倒谱系数/长短时记忆/注意力机制Key words
Speech Emotion Recognition/MFCC/Long Short-Term Memory/Attention Mechanism分类
信息技术与安全科学