计算机应用研究2024,Vol.41Issue(7):2018-2024,7.DOI:10.19734/j.issn.1001-3695.2023.11.0563
基于Conformer的端到端语音识别方法
End-to-end method based on Conformer for speech recognition
摘要
Abstract
The acoustic input network based on the Conformer encoder has the problem of insufficient extraction of FBank speech information and missing channel feature information.This paper proposed an end-to-end method based on RepVGG-SE-Conformer for speech recognition to solve these problems.Firstly,the proposed model used the multi-branch structure of RepVGG to enhance the speech information extraction capability,and using the structural re-parameterization fused the multi-branch into a single branch to reduce the computational complexity and speed up the model inference.Then,based on the squeeze-and-excitation network,the channel attention mechanism made up for the missing channel feature information to im-prove speech recognition accuracy.Finally,the experimental results on the public dataset Aishell-1 show that the proposed method's character error rate is reduced by 10.67%compared with Conformer,and the advancement of the method is veri-fied.In addition,the proposed RepVGG-SE acoustic input network has good generalization ability in the end-to-end scene,which can effectively improve the overall performance of speech recognition models based on Transformer variants.关键词
语音识别/Conformer/RepVGG/压缩和激励网络Key words
speech recognition/Conformer/RepVGG/squeeze-and-excitation network分类
信息技术与安全科学引用本文复制引用
胡从刚,申艺翔,孙永奇,赵思聪..基于Conformer的端到端语音识别方法[J].计算机应用研究,2024,41(7):2018-2024,7.基金项目
科技创新2030——"新一代人工智能"重大资助项目(2021ZD0113002) (2021ZD0113002)