现代信息科技2025,Vol.9Issue(12):129-133,140,6.DOI:10.19850/j.cnki.2096-4706.2025.12.025
基于VITS的高性能歌声转换模型
High-performance Singing Voice Conversion Model Based on VITS
周柯汝 1金伟1
作者信息
- 1. 浙江中医药大学 医学技术与信息工程学院,浙江 杭州 310053
- 折叠
摘要
Abstract
Singing voice conversion is the process of transforming the voice of the source singer into that of the target singer while retaining the original content and melody.With the development of technology,various network architectures and models have been put forward one after another,and the algorithms for singing voice conversion have also become diversified.However,problems such as poor quality of the converted audio,high distortion rates,and lack of vocal range are bound to occur.This paper proposes UVC(Ultra Singing Voice Conversion)model with multi-decoupled feature constraints based on high-fidelity flow.This model is built on the basis of the VIT model.By combining the ContentVec encoder and the NSF-HIFI-GAN vocoder,it improves the input and output of the model,greatly enhancing the quality and fluency of the converted audio and possessing strong robustness.关键词
歌声转换/VITS/ContentVec编码器/NSF-HIFI-GAN声码器Key words
singing voice conversion/VITS/ContentVec encoder/NSF-HIFI-GAN vocoder分类
信息技术与安全科学引用本文复制引用
周柯汝,金伟..基于VITS的高性能歌声转换模型[J].现代信息科技,2025,9(12):129-133,140,6.