首页|期刊导航|计算机应用研究|基于Conformer的端到端语音识别方法

基于Conformer的端到端语音识别方法

胡从刚申艺翔孙永奇赵思聪

计算机应用研究2024，Vol.41Issue(7)：2018-2024,7.

计算机应用研究2024，Vol.41Issue(7)：2018-2024,7.DOI:10.19734/j.issn.1001-3695.2023.11.0563

基于Conformer的端到端语音识别方法

End-to-end method based on Conformer for speech recognition

胡从刚 ¹申艺翔 ¹孙永奇 ¹赵思聪²

作者信息

1. 交通大数据与人工智能教育部重点实验室,北京 100044||北京交通大学计算机与信息技术学院,北京 100044
2. 北京航天晨信科技有限责任公司,北京 102308
折叠

摘要

Abstract

The acoustic input network based on the Conformer encoder has the problem of insufficient extraction of FBank speech information and missing channel feature information.This paper proposed an end-to-end method based on RepVGG-SE-Conformer for speech recognition to solve these problems.Firstly,the proposed model used the multi-branch structure of RepVGG to enhance the speech information extraction capability,and using the structural re-parameterization fused the multi-branch into a single branch to reduce the computational complexity and speed up the model inference.Then,based on the squeeze-and-excitation network,the channel attention mechanism made up for the missing channel feature information to im-prove speech recognition accuracy.Finally,the experimental results on the public dataset Aishell-1 show that the proposed method's character error rate is reduced by 10.67％compared with Conformer,and the advancement of the method is veri-fied.In addition,the proposed RepVGG-SE acoustic input network has good generalization ability in the end-to-end scene,which can effectively improve the overall performance of speech recognition models based on Transformer variants.

关键词

语音识别/Conformer/RepVGG/压缩和激励网络

Key words

speech recognition/Conformer/RepVGG/squeeze-and-excitation network

分类

信息技术与安全科学

引用本文复制引用

胡从刚,申艺翔,孙永奇,赵思聪..基于Conformer的端到端语音识别方法[J].计算机应用研究,2024,41(7):2018-2024,7.

基金项目

科技创新2030——"新一代人工智能"重大资助项目(2021ZD0113002) （2021ZD0113002）

计算机应用研究

OA北大核心CSTPCD

ISSN：1001-3695

访问量0

下载量0

段落导航