四川轻化工大学学报(自然科学版)2025,Vol.38Issue(4):18-27,10.DOI:10.11863/j.suse.2025.04.03
基于多通道特征融合的CNN-SVM说话人识别研究
Speaker Recognition Based on Multi-channel Feature Fusion with CNN-SVM
摘要
Abstract
In view of shortcomings of the single voiceprint feature with limited ability to represent the speaker's identity and the inability of traditional convolutional neural networks to fully learn voiceprint features,a new voiceprint feature,MGFCC(Mel-Gammatonetone-Frequency Cepstral Coefficients),is constructed by combining two voiceprint features,MFCC(Mel-Frequency Cepstral Coefficients)and GFCC(Gammatonetone-Frequency Cepstral Coefficients).MGFCC combines the advantages of the Mel filter bank and the Gamma filter bank to obtain more voiceprint information with higher discriminability.In order to further improve the accuracy of speaker recognition,the softmax classifier in CNN is replaced by an SVM classifier with better classification effect,and a new speaker discrimination model,CNN-SVM,is proposed.Experiments show that the accuracy of the fusion feature MGFCC on the open-source dataset is improved by 1.13 and 1.33 percentage points,respectively,compared with the MFCC and GFCC models,when the number of identified targets is same.The accuracy of MGFCC on self-built datasets is improved by 4.87 and 0.82 percentage points,respectively.Under the MGFCC feature,the accuracy of the CNN model using SVM classifier on open source datasets is 2.04 and 1.79 percentage points higher than that of traditional CNN and CNN using TREEB classifier,respectively.Compared with current mainstream discriminant models RDNN and ResNet-18,CNN-SVM model achieves higher discriminant accuracy,reaching 95.06%and 98.83%under self-built and open-source datasets,respectively.关键词
特征融合/Fisher准则/深度学习/机器学习/说话人识别Key words
feature fusion/Fisher criterion/deep learning/machine learning/speaker recognition分类
信息技术与安全科学引用本文复制引用
任俊红,詹旭,范涛,唐青林..基于多通道特征融合的CNN-SVM说话人识别研究[J].四川轻化工大学学报(自然科学版),2025,38(4):18-27,10.基金项目
四川省科技厅重点研发项目(2022YFS0544) (2022YFS0544)