| 注册
首页|期刊导航|计算机应用研究|融合信息扰动与特征解耦的单样本语音转换

融合信息扰动与特征解耦的单样本语音转换

王光 刘宗泽 董浩 姜彦吉

计算机应用研究2024,Vol.41Issue(10):3081-3086,6.
计算机应用研究2024,Vol.41Issue(10):3081-3086,6.DOI:10.19734/j.issn.1001-3695.2024.02.0034

融合信息扰动与特征解耦的单样本语音转换

One-shot voice conversion integrating information perturbation and feature decoupling

王光 1刘宗泽 1董浩 2姜彦吉1

作者信息

  • 1. 辽宁工程技术大学软件学院,辽宁葫芦岛 125105
  • 2. 清华大学苏州汽车研究院,江苏苏州 215134
  • 折叠

摘要

Abstract

The characteristic of one-shot voice conversion is the ability to transform identity using only a single speech sample from the target speaker.However,the intricate interactions and dynamic variations of acoustic features pose challenges for existing methods to fully disentangle the speaker's timbre from other acoustic features,resulting in the leakage of the original speaker's timbre in the converted audio.To tackle this challenge,this paper proposed the IPFD-VC model to incorporate infor-mation perturbation and feature decoupling.The model initiated three perturbation operations to the voice signal through an in-formation perturbation module in order to remove redundant information from input and the prosody encoder.Then it enabled to feed the processed signal into each encoders.The model employed a strategy of minimizing mutual information to further decou-ple the acoustic features,thereby diminishing their correlation with the speaker's timbre characteristics.The decoder and vo-coder subsequently output the convert audio.The experiments show that the IPFD-VC model achieves scores of 3.72 for voice naturalness and 3.68 for speaker similarity.In comparison to the advanced UUVC model,the model reduced the Mel-cepstral distortion by 0.26 dB.The IPFD-VC model effectively decouples acoustic features,captures the target speaker's timbre,pre-serves the source language content and rhythmic variations,and mitigates the risk of speaker timbre leakage.

关键词

单样本语音转换/信息扰动/特征解耦/说话人音色泄露

Key words

single-sample voice conversion/information perturbation/feature decoupling/speaker voice leakage

分类

信息技术与安全科学

引用本文复制引用

王光,刘宗泽,董浩,姜彦吉..融合信息扰动与特征解耦的单样本语音转换[J].计算机应用研究,2024,41(10):3081-3086,6.

基金项目

葫芦岛市科技计划资助项目(2023JH(1)4/02b) (2023JH(1)

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文