| 注册
首页|期刊导航|刑事技术|基于深度神经网络与注意力机制的端到端说话人识别

基于深度神经网络与注意力机制的端到端说话人识别

刘鹏展 王华朋

刑事技术2025,Vol.50Issue(3):235-242,8.
刑事技术2025,Vol.50Issue(3):235-242,8.DOI:10.16467/j.1008-3650.2024.0041

基于深度神经网络与注意力机制的端到端说话人识别

End-to-End Speaker Recognition Based on Deep Neural Networks and Attention Mechanisms

刘鹏展 1王华朋1

作者信息

  • 1. 中国刑事警察学院,沈阳 110854
  • 折叠

摘要

Abstract

In order to further improve the accuracy of speaker recognition and avoid the complicated process of manual feature extraction required by traditional speaker recognition methods,this paper proposes an end-to-end speaker recognition method based on CBAM attention mechanism and deep neural network.CBAM,a lightweight general module,is introduced into the deep neural network structure and seamlessly integrated into the network architecture.After it is added into the first layer of deep neural network convolution in this paper,the features of speech signals first pass through the CBAM channel attention module to strengthen the model's attention to the channel dimension of speech features.Then CBAM spatial attention module is used to improve the model's attention to the spatial dimension of speech features,further improve the model's sensitivity to important feature information,and use the end-to-end loss function to train the whole model as a whole.At the same time,an embedded court speaker recognition method based on generalized end-to-end loss function training is proposed,and the likelihood ratio is obtained by using the embedded cosine similarity score trained by the improved network model,so as to intuitively and accurately judge whether it is the same speaker,thus providing intuitive and powerful evidence for the court.Finally,taking deep neural network BILSTM and GRU as examples,the mainstream data set CN-Celeb was used to train the model to ensure that the model can achieve better combat effects in a complex and rich voice environment.Zhaishell,a subset of Zhvoice,and the audio of actual combat cases collected by ourselves were used for combat test,to ensure that the model in this paper has a good recognition effect for both Mandarin and dialect.The results show that the method proposed in this paper can effectively improve the recognition accuracy,quickly construct the model and improve the generalization ability.

关键词

说话人识别/广义端到端损失函数/注意力机制/余弦相似性/似然比

Key words

speaker recognition/generalized end-to-end loss function/attention mechanism/cosine similarity/likelihood ratio

分类

政治法律

引用本文复制引用

刘鹏展,王华朋..基于深度神经网络与注意力机制的端到端说话人识别[J].刑事技术,2025,50(3):235-242,8.

刑事技术

1008-3650

访问量0
|
下载量0
段落导航相关论文