计算机工程与应用2018,Vol.54Issue(13):231-235,5.DOI:10.3778/j.issn.1002-8331.1702-0269
藏语口语语音语料库的设计与研究
Design and research of Tibetan spoken speech corpus
摘要
Abstract
Based on the research and analysis of the construction method of traditional phonological corpus, combined with the related needs of natural spoken speech recognition and the characteristics of Tibetan natural spoken language, the construction scheme and annotation standard of spoken language corpus suitable for Tibetan speech recognition is designed. A 50-hour Tibetan Lhasa spoken corpus with five layers of annotation including phonemes, semitone, syllables, Tibetan word and sentences is also constructed. The statistic characteristics show that this corpus retains the natural properties of spoken language, and also has a balanced coverage of commonly used modeling units such as phonemes, semitone, so it is able to provide reliable data support for speech recognition technology based on Tibetan spoken speech data.关键词
语音语料库/口语语音/语音识别/标注规范/藏语拉萨话Key words
speech corpus/spoken speech/speech recognition/annotation standard/Tibetan Lhasa words分类
信息技术与安全科学引用本文复制引用
黄晓辉,李京,马睿..藏语口语语音语料库的设计与研究[J].计算机工程与应用,2018,54(13):231-235,5.基金项目
国家重点研发计划项目(No.2016YFB0201402). (No.2016YFB0201402)