首页|期刊导航|数据采集与处理|语音深度伪造溯源技术研究现状及展望

语音深度伪造溯源技术研究现状及展望

张雄伟张强孙蒙杨吉斌李毅豪葛晓义

数据采集与处理2026，Vol.41Issue(2)：347-370,24.

数据采集与处理2026，Vol.41Issue(2)：347-370,24.DOI:10.16337/j.1004-9037.2026.02.005

语音深度伪造溯源技术研究现状及展望

Speech Deepfake Attribution:The State of the Art and Prospects

张雄伟 ¹张强 ¹孙蒙 ¹杨吉斌 ¹李毅豪 ¹葛晓义²

作者信息

1. 陆军工程大学,南京 210007
2. 信息支援部队工程大学,武汉 430000
折叠

摘要

Abstract

With the rapid evolution of generative artificial intelligence,speech deepfake technologies have achieved unprecedented realism,enabling the synthesis of highly natural and speaker-specific speech from only a few seconds of reference audio.While traditional countermeasures have primarily focused on binary detection—such approaches are insufficient for forensic investigation,legal accountability,and security governance.In real-world adversarial scenarios,it is not enough to determine whether speech is fake;it is equally critical to identify how it was generated,whose voice characteristics were exploited,and which specific model instance may have been involved.This paradigm shift from"detection"to"attribution"marks a fundamental transformation in speech security research.This paper presents a comprehensive survey of speech deepfake attribution,systematically organizing the field into a hierarchical forensic framework that includes three progressive tasks:forgery method attribution,source speaker attribution,and model inversion.Forgery method attribution aims to identify the generative architecture or vocoder family responsible for producing the fake speech by exploiting intrinsic"model fingerprints"embedded in spectral,temporal,and phase domains.Source speaker tracing focuses on recovering or verifying the identity of the original speaker whose voice was converted,leveraging residual prosodic,behavioral,and physiological cues that survive imperfect disentanglement in voice conversion systems.Model inversion represents a deeper forensic objective,attempting to infer specific model parameters or configurations from generated speech,thereby bridging the gap between class-level attribution and instance-level accountability.From both the perspectives of generative model mechanisms and physical acoustic characteristics of speech signals,the feasible core principles for each subtask are elaborated.Different dimensions,such as architectural frameworks and training strategies,are distinguished to systematically organize the research status,mainstream methodologies,and technological evolution paths of each subtask.Furthermore,benchmark datasets and evaluation metrics for both closed-set and open-set scenarios are systematically summarized.Finally,the paper discusses emerging challenges such as open-world generalization,robustness under complex channel distortions and neural codecs,adversarial attacks,and ethical constraints related to privacy and legal admissibility.Future directions are outlined toward proactive traceability,model-level reverse engineering,robust feature disentanglement,and the integration of active watermarking with passive forensic techniques.The survey aims to provide a structured roadmap for advancing speech deepfake attribution and fostering a trustworthy digital speech ecosystem.

关键词

语音深度伪造/语音伪造方法溯源/源说话人溯源/模型逆向/开放集识别

Key words

speech deepfake/speech forgery method attribution/source speaker attribution/model inversion/open-set recognition

分类

信息技术与安全科学

引用本文复制引用

张雄伟,张强,孙蒙,杨吉斌,李毅豪,葛晓义..语音深度伪造溯源技术研究现状及展望[J].数据采集与处理,2026,41(2):347-370,24.

基金项目

国家自然科学基金(62371469,62071484). National Natural Science Foundation of China(Nos.62371469,62071484). （62371469,62071484）

数据采集与处理

ISSN：1004-9037

访问量0

下载量0

段落导航