计算机工程与应用2024,Vol.60Issue(4):122-132,11.DOI:10.3778/j.issn.1002-8331.2209-0099
面向不平衡数据集的语音情感识别研究
Speech Emotion Recognition for Imbalanced Datasets
摘要
Abstract
The sample balance is crucial for machine learning.The importance of certain classes may be higher than its number on the imbalanced datasets.This paper studies the imbalanced datasets for speech emotion recognition.Firstly,the imbalanced baseline datasets EMODB and IEMOCAP are augmented with different signal-to-noise ratios,and the datasets EMODBM and IEMOCAPM are constructed.Secondly,six techniques namely SMOTE,RandomOverSampler,SMOTEENN,ADASYN,TomekLinks and SMOTETomek are adopted to resample the baseline datasets,and the augmented datasets are constructed to achieve the category balance.Thirdly,21-dimensional low-level descriptor features are extracted from the baseline datasets and the augmented datasets.Finally,a novel model MA-CapsNet is proposed to validate the effectiveness of the resampling techniques.The results show that all types of emotion samples are basically balanced after resampling,which makes the learning of the model MA-CapsNet fairer.In addition,the model MA-CapsNet has better robustness on the resampling datasets.关键词
语音情感识别/重采样/胶囊网络/数据扩充Key words
speech emotion recognition/resampling/capsule network/data augmentation分类
信息技术与安全科学引用本文复制引用
张会云,黄鹤鸣..面向不平衡数据集的语音情感识别研究[J].计算机工程与应用,2024,60(4):122-132,11.基金项目
国家自然科学基金(62066039) (62066039)
青海省自然科学基金(2022-ZJ-925). (2022-ZJ-925)