首页|期刊导航|计算机科学与探索|改进深度残差收缩网络的端到端合成语音检测

改进深度残差收缩网络的端到端合成语音检测

曾高俊芦天亮任英杰李御瑾彭舒凡

计算机科学与探索2025，Vol.19Issue(4)：1076-1086,11.

计算机科学与探索2025，Vol.19Issue(4)：1076-1086,11.DOI:10.3778/j.issn.1673-9418.2404088

改进深度残差收缩网络的端到端合成语音检测

End-to-End Synthetic Speech Detection Based on Improved Deep Residual Shrinkage Networks

曾高俊 ¹芦天亮 ¹任英杰 ²李御瑾 ¹彭舒凡¹

作者信息

1. 中国人民公安大学信息网络安全学院,北京 100038
2. 公安部网络安全保卫局,北京 100741
折叠

摘要

Abstract

The misuse of synthetic speech has led to numerous real-world problems.Researching corresponding anti-counterfeiting techniques is of great significance for protecting the personal and property safety of citizens and ensuring social and national security.Traditional synthetic speech detection often combines manually designed features with back-end classifiers.The manual front-end features involve complex prior knowledge,and using a single manual feature model yields unsatisfactory detection results.However,fusing multiple features leads to a large number of model parameters.Moreover,most detection methods suffer from poor generalization across datasets.To address these issues,an end-to-end synthetic speech detection method based on an improved deep residual contraction network is proposed.Firstly,a channel attention mechanism is integrated to redesign the adaptive threshold learning module,improving the accuracy of threshold learning.Secondly,a frame attention mechanism module is designed and introduced to assign different attention levels to different frames,enhancing the model's feature selection capability.Then,an improved wavelet threshold function with two hyperparameters is designed and introduced to enhance the ability of the thresholding module to suppress irrelevant features.Finally,an end-to-end synthetic speech detection network based on the improved deep residual contraction net-work is designed,which can determine whether the input raw speech is synthetic speech.Comparative experimental re-sults based on the ASVspoof2019 LA dataset show that the proposed method reduces the equal error rate and minimum concatenated detection cost function of the baseline model by 85%and 84%,respectively.Cross-database testing results based on the ASVspoof2015 LA dataset validate the generalization performance of the proposed method.

关键词

合成语音检测/深度残差收缩网络/帧注意力/小波阈值函数

Key words

synthetic speech detection/deep residual shrinkage networks/frame attention/wavelet threshold function

分类

信息技术与安全科学

引用本文复制引用

曾高俊,芦天亮,任英杰,李御瑾,彭舒凡..改进深度残差收缩网络的端到端合成语音检测[J].计算机科学与探索,2025,19(4):1076-1086,11.

基金项目

国家社会科学基金重点项目(20AZD114).This work was supported by the National Social Science Foundation of China(20AZD114). （20AZD114）

计算机科学与探索

OA北大核心

ISSN：1673-9418

访问量0

下载量0

段落导航