华南理工大学学报(自然科学版)2025,Vol.53Issue(9):1-10,10.DOI:10.12141/j.issn.1000-565X.250134
CODS:用于粤剧人声合成的音频-文本对齐数据集
CODS:An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis
摘要
Abstract
As one of the traditional Chinese arts,Chinese opera culture has unique musical expressiveness.Can-tonese opera,as one of the main Chinese opera genres and an important carrier of Lingnan culture,has been indexed in the World Intangible Cultural Heritage List.In recent years,generative artificial intelligence technology has demonstrated its powerful capabilities in the field of content creation.For example,singing synthesis techno-logy can synthesize natural singing based on specified music scores.This provides a new idea for the digital protec-tion and innovation of Cantonese opera.However,the collection and organization of opera data faces problems such as poor audio quality and complex dialect annotation,resulting in an extreme shortage of high-quality opera data sets.Based on this,this paper applied the singing synthesis technology in the field of pop music to the field of Cantonese opera vocal synthesis,and proposed the first Cantonese opera vocal synthesis dataset with phoneme-level annotation and audio-text alignment.Firstly,this paper constructed the CODS dataset through a systematic process.This dataset was derived from 29 original works by four famous performers with a total length of 3.81 hours,which provides important support for the research and digitization of Cantonese opera.Using this dataset,this paper con-ducted experiments with a deep learning-based method for Cantonese opera voice synthesis,realizing controllable generation in terms of lyrics,timbre,and melody.Finally,this paper established a comprehensive evaluation frame-work for Cantonese opera synthesis.Both objective and subjective evaluations reached a satisfactory level within the domain,further validating the usability of the proposed dataset.The CODS dataset constructed in this paper successfully filled the gap in artificial intelligence in the field of Cantonese opera vocal synthesis,and strongly promoted the inheritance and innovation of this traditional art.关键词
粤剧/生成式人工智能/数据集/人声合成Key words
Cantonese opera/generative artificial intelligence/dataset/voice synthesis分类
信息技术与安全科学引用本文复制引用
李粤,黄奕翰,彭郑威,谢吉轩,杜宇烨..CODS:用于粤剧人声合成的音频-文本对齐数据集[J].华南理工大学学报(自然科学版),2025,53(9):1-10,10.基金项目
国家自然科学基金项目(62476096)Supported by the National Natural Science Foundation of China(62476096) (62476096)