首页|期刊导航|自动化学报|基于DNN的低资源语音识别特征提取技术

基于DNN的低资源语音识别特征提取技术

秦楚雄张连海

自动化学报2017，Vol.43Issue(7)：1208-1219,12.

自动化学报2017，Vol.43Issue(7)：1208-1219,12.DOI:10.16383/j.aas.2017.c150654

基于DNN的低资源语音识别特征提取技术

Deep Neural Network Based Feature Extraction for Low-resource Speech Recognition

秦楚雄 ¹张连海¹

作者信息

1. 信息工程大学信息系统工程学院郑州450001
折叠

摘要

Abstract

To alleviate the performance degradation that deep neural network (DNN) based features suffer from transcribed training data is insufficient,two deep neural network based feature extraction approaches to low-resource speech recognition are proposed.Firstly,some high-resource corpuses are used to help train a bottleneck deep neural network using a shared-hidden-layer network structure and dropout,maxout,and rectified linear units methods are exploited in order to enhance the training effect and reduce the number of network parameters,so that the overfitting problem by irregular distributions of multi-stream training samples can be solved and multilingual training time can be reduced.Secondly,a convex-non-negative matrix factorization (CNMF) based low-dimensional high-level feature extraction approach is proposed.The weight matrix of hidden layer is factorized to obtain the basis matrix as the weight matrix of the newly formed feature-layer,from which a new type of feature is extracted.Experiments on 1 hour's Vystadial 2013 Czech low-resource training data show that with the help of 26.7 hours' English training data,the recognition system obtains a 7.0 ％ relative word error rate reduction from the baseline system when dropout and rectified linear units are applied,and obtains a 12.6 ％ relative word error rate reduction while reduces 62.7％ relative network parameters and 25 ％ training time as compared to other proposed systems when dropout and maxout are applied.Matrix factorization based features perform better than bottleneck features (BNF) in both low-resource monolingual and multilingual training situations.They also gain better word accuracies than the state-of-art deep neural network hidden Markov models hybrid systems,by from 0.8 ％ to 3.4 ％.

关键词

低资源语音识别/深层神经网络/瓶颈特征/凸非负矩阵分解

Key words

Low-resource speech recognition/deep neural network (DNN)/bottleneck features (BNF)/convex-nonnegative matrix factorization (CNMF)

引用本文复制引用

秦楚雄,张连海..基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219,12.

基金项目

国家自然科学基金(61673395,61302107,61403415)资助 Supported by National Natural Science Foundation of China(61673395,61302107,61403415) （61673395,61302107,61403415）

自动化学报

OA北大核心CSCDCSTPCD

ISSN：0254-4156

访问量0

下载量0

段落导航