中国科学数据(中英文网络版)2026,Vol.11Issue(1):31-42,12.DOI:10.11922/11-6035.csd.2025.0122.zh
XBMU-bo-Lhasa31:藏语拉萨话语音识别数据集
XBMU-bo-Lhasa31:A dataset of speech recognition for the Lhasa Dialect of Tibetan
摘要
Abstract
Tibetan speech recognition has important application value in fields such as Tibetan language education,news dissemination and other fields.The Lhasa dialect of Tibetan is widely used in Lhasa City and its surrounding regions.However,due to geographical and other constrains,currently available Tibetan speech data resources remained limited and high-quality annotated data are particularly scarce.For this reason,this study constructs a professionally designed and standardized speech recognition dataset for the Lhasa dialect of Tibetan.The dataset was recorded in real-world environments using self-developed recording software,and was collected from 51 speakers,with a total duration of 31.61 hours,containing 24,289 speech samples,with an average duration of 4.68 seconds per sample.The data content was primarily selected from news-related texts to ensure linguistic standardization and domain representativeness.In order to guarantee data quality,we implemented a strict quality control process:firstly,the original texts were segmented into sentences and manually verified;after the recordings were completed,the Voice Activity Detection(VAD)technique was used to filter and regain high-quality speech samples;in addition,non-pronounced symbols in the text were normalized to improve the accuracy of speech recognition.The establishment of this dataset provides an important foundational resource for Tibetan speech recognition and is expected to facilitate the development of Tibetan speech recognition technology.关键词
语音识别/藏语拉萨话/多说话人/语音语料库Key words
Speech Recognition/Tibetan Lhasa dialect/Multi-Speaker/phonetic corpus引用本文复制引用
马立克,李冠宇,谢晨宇,孙倩,郭玉豪..XBMU-bo-Lhasa31:藏语拉萨话语音识别数据集[J].中国科学数据(中英文网络版),2026,11(1):31-42,12.基金项目
国家自然科学基金(61633013) (61633013)
2024年甘肃省科技重大专项计划(24ZDFA004). National Natural Science Foundation of China(61633013) (24ZDFA004)
Gansu Province Science and Technology Major Special Program in 2024(24ZDFA004). (24ZDFA004)