| 注册
首页|期刊导航|华南理工大学学报(自然科学版)|CODS:用于粤剧人声合成的音频-文本对齐数据集

CODS:用于粤剧人声合成的音频-文本对齐数据集

李粤 黄奕翰 彭郑威 谢吉轩 杜宇烨

华南理工大学学报(自然科学版)2025,Vol.53Issue(9):1-10,10.
华南理工大学学报(自然科学版)2025,Vol.53Issue(9):1-10,10.DOI:10.12141/j.issn.1000-565X.250134

CODS:用于粤剧人声合成的音频-文本对齐数据集

CODS:An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis

李粤 1黄奕翰 1彭郑威 2谢吉轩 1杜宇烨1

作者信息

  • 1. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 2. 中山大学 计算机学院,广东 广州 510006
  • 折叠

摘要

Abstract

As one of the traditional Chinese arts,Chinese opera culture has unique musical expressiveness.Can-tonese opera,as one of the main Chinese opera genres and an important carrier of Lingnan culture,has been indexed in the World Intangible Cultural Heritage List.In recent years,generative artificial intelligence technology has demonstrated its powerful capabilities in the field of content creation.For example,singing synthesis techno-logy can synthesize natural singing based on specified music scores.This provides a new idea for the digital protec-tion and innovation of Cantonese opera.However,the collection and organization of opera data faces problems such as poor audio quality and complex dialect annotation,resulting in an extreme shortage of high-quality opera data sets.Based on this,this paper applied the singing synthesis technology in the field of pop music to the field of Cantonese opera vocal synthesis,and proposed the first Cantonese opera vocal synthesis dataset with phoneme-level annotation and audio-text alignment.Firstly,this paper constructed the CODS dataset through a systematic process.This dataset was derived from 29 original works by four famous performers with a total length of 3.81 hours,which provides important support for the research and digitization of Cantonese opera.Using this dataset,this paper con-ducted experiments with a deep learning-based method for Cantonese opera voice synthesis,realizing controllable generation in terms of lyrics,timbre,and melody.Finally,this paper established a comprehensive evaluation frame-work for Cantonese opera synthesis.Both objective and subjective evaluations reached a satisfactory level within the domain,further validating the usability of the proposed dataset.The CODS dataset constructed in this paper successfully filled the gap in artificial intelligence in the field of Cantonese opera vocal synthesis,and strongly promoted the inheritance and innovation of this traditional art.

关键词

粤剧/生成式人工智能/数据集/人声合成

Key words

Cantonese opera/generative artificial intelligence/dataset/voice synthesis

分类

信息技术与安全科学

引用本文复制引用

李粤,黄奕翰,彭郑威,谢吉轩,杜宇烨..CODS:用于粤剧人声合成的音频-文本对齐数据集[J].华南理工大学学报(自然科学版),2025,53(9):1-10,10.

基金项目

国家自然科学基金项目(62476096)Supported by the National Natural Science Foundation of China(62476096) (62476096)

华南理工大学学报(自然科学版)

OA北大核心

1000-565X

访问量0
|
下载量0
段落导航相关论文