自动化学报2017,Vol.43Issue(7):1208-1219,12.DOI:10.16383/j.aas.2017.c150654
基于DNN的低资源语音识别特征提取技术
Deep Neural Network Based Feature Extraction for Low-resource Speech Recognition
摘要
Abstract
To alleviate the performance degradation that deep neural network (DNN) based features suffer from transcribed training data is insufficient,two deep neural network based feature extraction approaches to low-resource speech recognition are proposed.Firstly,some high-resource corpuses are used to help train a bottleneck deep neural network using a shared-hidden-layer network structure and dropout,maxout,and rectified linear units methods are exploited in order to enhance the training effect and reduce the number of network parameters,so that the overfitting problem by irregular distributions of multi-stream training samples can be solved and multilingual training time can be reduced.Secondly,a convex-non-negative matrix factorization (CNMF) based low-dimensional high-level feature extraction approach is proposed.The weight matrix of hidden layer is factorized to obtain the basis matrix as the weight matrix of the newly formed feature-layer,from which a new type of feature is extracted.Experiments on 1 hour's Vystadial 2013 Czech low-resource training data show that with the help of 26.7 hours' English training data,the recognition system obtains a 7.0 % relative word error rate reduction from the baseline system when dropout and rectified linear units are applied,and obtains a 12.6 % relative word error rate reduction while reduces 62.7% relative network parameters and 25 % training time as compared to other proposed systems when dropout and maxout are applied.Matrix factorization based features perform better than bottleneck features (BNF) in both low-resource monolingual and multilingual training situations.They also gain better word accuracies than the state-of-art deep neural network hidden Markov models hybrid systems,by from 0.8 % to 3.4 %.关键词
低资源语音识别/深层神经网络/瓶颈特征/凸非负矩阵分解Key words
Low-resource speech recognition/deep neural network (DNN)/bottleneck features (BNF)/convex-nonnegative matrix factorization (CNMF)引用本文复制引用
秦楚雄,张连海..基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219,12.基金项目
国家自然科学基金(61673395,61302107,61403415)资助 Supported by National Natural Science Foundation of China(61673395,61302107,61403415) (61673395,61302107,61403415)