东南大学学报(英文版)2022,Vol.38Issue(2):103-109,7.DOI:10.3969/j.issn.1003-7985.2022.02.001
基于多头注意力长短期记忆模型的语音情感识别方法
Multi-head attention-based long short-term memory model for speech emotion recognition
摘要
Abstract
To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model uses frame-level features and takes the temporal information of emotion speech as the input of the LSTM layer.Here,a multi-head time-dimension attention(MHTA)layer was employed to linearly project the output of the LSTM layer into different subspaces for the reduced-dimension context vectors.To provide relative vital information from other dimensions,the output of MHTA,the output of feature-dimension attention,and the last time-step output of LSTM were utilized to form multiple context vectors as the input of the fully connected layer.To improve the performance of multiple vectors,feature-dimension attention was employed for the all-time output of the first LSTM layer.The proposed model was evaluated on the eNTERFACE and GEMEP corpora,respectively.The results indicate that the proposed model outperforms LSTM by 14.6%and 10.5%for eNTERFACE and GEMEP,respectively,proving the effectiveness of the proposed model in SER tasks.关键词
语音情感识别/长短期记忆/多头注意力机制/帧级别特征/自注意力Key words
speech emotion recognition/long short-term memory(LSTM)/multi-head attention mechanism/frame-level features/self-attention分类
信息技术与安全科学引用本文复制引用
赵焱,赵力,路成,李溯南,唐传高,连海伦..基于多头注意力长短期记忆模型的语音情感识别方法[J].东南大学学报(英文版),2022,38(2):103-109,7.基金项目
The National Natural Science Foundation of China(No.61571106,61633013,61673108,81871444). (No.61571106,61633013,61673108,81871444)