计算机应用研究2024,Vol.41Issue(10):3081-3086,6.DOI:10.19734/j.issn.1001-3695.2024.02.0034
融合信息扰动与特征解耦的单样本语音转换
One-shot voice conversion integrating information perturbation and feature decoupling
摘要
Abstract
The characteristic of one-shot voice conversion is the ability to transform identity using only a single speech sample from the target speaker.However,the intricate interactions and dynamic variations of acoustic features pose challenges for existing methods to fully disentangle the speaker's timbre from other acoustic features,resulting in the leakage of the original speaker's timbre in the converted audio.To tackle this challenge,this paper proposed the IPFD-VC model to incorporate infor-mation perturbation and feature decoupling.The model initiated three perturbation operations to the voice signal through an in-formation perturbation module in order to remove redundant information from input and the prosody encoder.Then it enabled to feed the processed signal into each encoders.The model employed a strategy of minimizing mutual information to further decou-ple the acoustic features,thereby diminishing their correlation with the speaker's timbre characteristics.The decoder and vo-coder subsequently output the convert audio.The experiments show that the IPFD-VC model achieves scores of 3.72 for voice naturalness and 3.68 for speaker similarity.In comparison to the advanced UUVC model,the model reduced the Mel-cepstral distortion by 0.26 dB.The IPFD-VC model effectively decouples acoustic features,captures the target speaker's timbre,pre-serves the source language content and rhythmic variations,and mitigates the risk of speaker timbre leakage.关键词
单样本语音转换/信息扰动/特征解耦/说话人音色泄露Key words
single-sample voice conversion/information perturbation/feature decoupling/speaker voice leakage分类
信息技术与安全科学引用本文复制引用
王光,刘宗泽,董浩,姜彦吉..融合信息扰动与特征解耦的单样本语音转换[J].计算机应用研究,2024,41(10):3081-3086,6.基金项目
葫芦岛市科技计划资助项目(2023JH(1)4/02b) (2023JH(1)