电子器件2025,Vol.48Issue(6):1260-1267,8.DOI:10.3969/j.issn.1005-9490.2025.06.010
基于双通道的流式语音识别技术研究
Research on Streaming Speech Recognition Technology Based on Two-Pass Approach
摘要
Abstract
Recently,the end-to-end model based on RNN-Transducer(RNN-T)has shown superior performance on streaming speech rec-ognition tasks.Although this model inherently possesses streaming capabilities,the recognition quality still lags behind the advanced non-streaming model.In addition,RNN-T tends to put the prediction delay last,thus incurring higher partial latency.To better balance the character error rate(CER)and latency indicators,a two-pass model combining RNN-T and attention-based encoder-decoder is pro-posed.Specifically,the encoder in the first pass replaces the RNN-T encoder with Transformer layers that leverage blockwise paralleliza-tion to capture global context across chunks and reduce decoding cost for lower latency.The second pass adopts an improved Transform-er rescorer to process the entire streaming hypothesis in parallel for more efficient use of computational resources.Experiments on Aishell-1 show the proposed two-pass model reduces CER by approximately 40%compared to RNN-T given acceptable latency.The proposed model effectively balances recognition accuracy and latency for streaming ASR.关键词
流式语音识别/块机制/端到端/Transformer/ConformerKey words
streaming speech recognition/block mechanism/end-to-end/Transformer/Conformer分类
信息技术与安全科学引用本文复制引用
GAO Lu,WANG Yahao,ZHANG Fei,REN Xiaoying,HAO Bin,HAN Yaxu..基于双通道的流式语音识别技术研究[J].电子器件,2025,48(6):1260-1267,8.基金项目
国家自然科学基金项目(62161041) (62161041)
内蒙古自然科学基金项目(2022SHZR0375) (2022SHZR0375)
内蒙古自治区重点研发和成果转化项目(2025SYFHH0223) (2025SYFHH0223)