计算机科学与探索2025,Vol.19Issue(4):1076-1086,11.DOI:10.3778/j.issn.1673-9418.2404088
改进深度残差收缩网络的端到端合成语音检测
End-to-End Synthetic Speech Detection Based on Improved Deep Residual Shrinkage Networks
摘要
Abstract
The misuse of synthetic speech has led to numerous real-world problems.Researching corresponding anti-counterfeiting techniques is of great significance for protecting the personal and property safety of citizens and ensuring social and national security.Traditional synthetic speech detection often combines manually designed features with back-end classifiers.The manual front-end features involve complex prior knowledge,and using a single manual feature model yields unsatisfactory detection results.However,fusing multiple features leads to a large number of model parameters.Moreover,most detection methods suffer from poor generalization across datasets.To address these issues,an end-to-end synthetic speech detection method based on an improved deep residual contraction network is proposed.Firstly,a channel attention mechanism is integrated to redesign the adaptive threshold learning module,improving the accuracy of threshold learning.Secondly,a frame attention mechanism module is designed and introduced to assign different attention levels to different frames,enhancing the model's feature selection capability.Then,an improved wavelet threshold function with two hyperparameters is designed and introduced to enhance the ability of the thresholding module to suppress irrelevant features.Finally,an end-to-end synthetic speech detection network based on the improved deep residual contraction net-work is designed,which can determine whether the input raw speech is synthetic speech.Comparative experimental re-sults based on the ASVspoof2019 LA dataset show that the proposed method reduces the equal error rate and minimum concatenated detection cost function of the baseline model by 85%and 84%,respectively.Cross-database testing results based on the ASVspoof2015 LA dataset validate the generalization performance of the proposed method.关键词
合成语音检测/深度残差收缩网络/帧注意力/小波阈值函数Key words
synthetic speech detection/deep residual shrinkage networks/frame attention/wavelet threshold function分类
信息技术与安全科学引用本文复制引用
曾高俊,芦天亮,任英杰,李御瑾,彭舒凡..改进深度残差收缩网络的端到端合成语音检测[J].计算机科学与探索,2025,19(4):1076-1086,11.基金项目
国家社会科学基金重点项目(20AZD114).This work was supported by the National Social Science Foundation of China(20AZD114). (20AZD114)