计算机与数字工程2019,Vol.47Issue(12):3099-3106,8.DOI:10. 3969/j. issn. 1672-9722. 2019. 12. 031
基于递归神经网络的端到端语音识别
End-to-End Speech Recognition Based on Recurrent Neural Network
摘要
Abstract
This paper presents a speech recognition system that transcribes audio data directly from text. A recursive neural network(RNN)structure based on deep bidirectional long-term and short-term memory(LSTM)is combined with the objective function of connection time classification(CTC). The objective function is modified to minimize the expectation of the training net?work for any transcription loss function. Even in the absence of dictionaries or language models,word error rates can be directly opti?mized. In the absence of language information,the system achieves 27.3% word error rate(WER)for the wall street journal corpus, 21.9% under the condition of only allowing word dictionaries,and 8.2% under the ternary language model. By combining the pro?posed method with the benchmark system,the error rate is further reduced to 6.7%.关键词
递归神经网络/语音识别/长短期记忆/连接时间分类/单词错误率Key words
RNN/speech recognition/LSTM/CTC/WER分类
信息技术与安全科学引用本文复制引用
王子龙,李俊峰,张劭韡,王宏岩,王思杰..基于递归神经网络的端到端语音识别[J].计算机与数字工程,2019,47(12):3099-3106,8.基金项目
国家自然科学基金项目(编号:51776082)资助. (编号:51776082)