计算机工程与应用2023,Vol.59Issue(24):110-120,11.DOI:10.3778/j.issn.1002-8331.2208-0085
改进Res2Net的多尺度端到端说话人识别系统
Multi-Scale End-to-End Speaker Recognition System Based on Improved Res2Net
摘要
Abstract
The feature extraction ability of lightweight convolutional neural networks in speaker recognition systems is weak and recognition is poor.And to improve the feature extraction ability,many methods use deeper,wider and more complex network structures,which make the number of parameters and inference time increase exponentially.This paper introduces Res2Net in target detection task to speaker recognition task,and verifies its effectiveness and robustness in speaker recognition task.And FullRes2Net is improved and proposed to have stronger multi-scale feature extraction capa-bility without increasing the number of parameters,and 17%performance improvement compared to Res2Net.Mean-while,in order to solve the problems of existing attention methods improve the shortcomings of convolution itself and fur-ther enhance the feature extraction ability of convolutional neural networks,mixed time-frequency channel attention is proposed.Experiments are conducted on the Voxceleb dataset,and the results show that the proposed method effectively improves the feature extraction ability and generalization ability of the system,with a 34%performance improvement compared to Res2Net,and outperforms advanced speaker recognition systems using complex structures,which is an end-to-end structure with fewer parameters and higher efficiency,suitable for applications in realistic scenarios.关键词
说话人识别/端到端/注意力机制Key words
speaker recognition/end-to-end/attention mechanisms分类
信息技术与安全科学引用本文复制引用
邓力洪,邓飞,张葛祥,杨强..改进Res2Net的多尺度端到端说话人识别系统[J].计算机工程与应用,2023,59(24):110-120,11.基金项目
国家自然科学基金(61972324) (61972324)
四川省科技计划(2021YFS0313,2021YFG0133). (2021YFS0313,2021YFG0133)