重庆邮电大学学报(自然科学版)2012,Vol.24Issue(2):127-132,6.DOI:10.3979/j.issn.1673-825X.2012.02.001
基于统计阈值的鲁棒性语音识别
Statistical thresholding for robust ASR
摘要
Abstract
Speech recognition systems have been applied in real world applications for several decades, where there should be an unsatisfactory recognition performance under various noise conditions, particularly in lower signal-to-noise ratio (SNR) circumstances.In this paper, we propose a statistical thresholding method for mean and variance normalization technique, further reducing the mismatch between training and testing environments, which makes an automatic speech recognition system more robust to environmental changes.Mel-frequency cepstrum coefficient (MFCC) features are extracted as acoustic features, and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization (CMVN) features.The proposed statistical thresholding method is then applied.The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels.In an isolated word recognition task, the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end, and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction (CMS) and CMVN.关键词
鲁棒性/特征提取/均值减/均值方差归一(MVN)/梅尔频率倒谱系数(MFCC)/统计阈值/语音识别Key words
robust/feature extraction/mean subtraction/mean and variant normalization/Mel-frequency cepstrum coefficient (MFCC)/statistical thresholding/speech recognition分类
信息技术与安全科学引用本文复制引用
李银国,蒲甫安,郑方..基于统计阈值的鲁棒性语音识别[J].重庆邮电大学学报(自然科学版),2012,24(2):127-132,6.基金项目
The National High Technology Research and Development Program of China Project (2009ZX01038-002-002-2) (2009ZX01038-002-002-2)