首页|期刊导航|计算机工程与应用|藏语口语语音语料库的设计与研究

藏语口语语音语料库的设计与研究

黄晓辉李京马睿

计算机工程与应用2018，Vol.54Issue(13)：231-235,5.

计算机工程与应用2018，Vol.54Issue(13)：231-235,5.DOI:10.3778/j.issn.1002-8331.1702-0269

藏语口语语音语料库的设计与研究

Design and research of Tibetan spoken speech corpus

黄晓辉 ¹李京 ²马睿¹

作者信息

1. 中国科学技术大学计算机科学与技术学院,合肥 230026
2. 解放军外国语学院工程系,河南洛阳 471003
折叠

摘要

Abstract

Based on the research and analysis of the construction method of traditional phonological corpus, combined with the related needs of natural spoken speech recognition and the characteristics of Tibetan natural spoken language, the construction scheme and annotation standard of spoken language corpus suitable for Tibetan speech recognition is designed. A 50-hour Tibetan Lhasa spoken corpus with five layers of annotation including phonemes, semitone, syllables, Tibetan word and sentences is also constructed. The statistic characteristics show that this corpus retains the natural properties of spoken language, and also has a balanced coverage of commonly used modeling units such as phonemes, semitone, so it is able to provide reliable data support for speech recognition technology based on Tibetan spoken speech data.

关键词

语音语料库/口语语音/语音识别/标注规范/藏语拉萨话

Key words

speech corpus/spoken speech/speech recognition/annotation standard/Tibetan Lhasa words

分类

信息技术与安全科学

引用本文复制引用

黄晓辉,李京,马睿..藏语口语语音语料库的设计与研究[J].计算机工程与应用,2018,54(13):231-235,5.

基金项目

国家重点研发计划项目(No.2016YFB0201402). （No.2016YFB0201402）

计算机工程与应用

OA北大核心CSCDCSTPCD

ISSN：1002-8331

访问量13

下载量0

段落导航