广西师范大学学报(自然科学版)2024,Vol.42Issue(4):11-21,11.DOI:10.16088/j.issn.1001-6600.2023111303
面向域外说话人适应场景的多层级解耦个性化语音合成
Multi-level Disentangled Personalized Speech Synthesis for Out-of-Domain Speakers Adaptation Scenarios
摘要
Abstract
Personalized speech synthesis aims to generate speech with specific speaker's characteristics.Traditional approaches often exhibit noticeable timbre disparities when synthesizing speech from unseen speakers,making it challenging to disentangle speaker-specific timbre features.This paper proposes a multi-level disentangled personalized speech synthesis approach designed for out-of-domain speakers.By fusing features at different granularities,the proposed method effectively enhances the performance of synthesizing speech from unseen speakers under zero-resource conditions.This is achieved by utilizing fast Fourier convolution to extract global speaker features,thereby enhancing the model's generalization to unseen speakers and enabling sentence-level speaker decoupling.Additionally,leveraging a speech recognition model,the method decouples speaker features at the phoneme level and captures phoneme-level timbre features through an attention mechanism,achieving phoneme-level speaker disentanglement.Experimental results on the publicly available dataset AISHELL3 demonstrate that the proposed approach achieves a cosine similarity of 0.697 for speaker feature vectors of cross-speaker adaptation,indicating a 6.25%improvement compared with the baseline model.This enhancement shows the method's capability in modeling timbre features for speech from unseen speakers in cross-speaker adaptation scenarios.关键词
语音合成/零资源/说话人表征/域外说话人/特征解耦Key words
speech synthesis/zero-shot/speaker representation/out-of-domain speaker/feature disentanglement分类
信息技术与安全科学引用本文复制引用
高盛祥,杨元樟,王琳钦,莫尚斌,余正涛,董凌..面向域外说话人适应场景的多层级解耦个性化语音合成[J].广西师范大学学报(自然科学版),2024,42(4):11-21,11.基金项目
国家自然科学基金(62376111,U23A20388,61972186,U21B2027) (62376111,U23A20388,61972186,U21B2027)
云南高新技术产业发展项目(201606) (201606)
云南省基础研究计划项目(202001AS070014) (202001AS070014)
云南省科技人才与平台计划项目(202105AC160018) (202105AC160018)
云南省媒体融合重点实验室开放课题(220225702) (220225702)
云南省重点研发计划项目(202303AP140008,202103AA080015) (202303AP140008,202103AA080015)