| 注册
首页|期刊导航|计算机工程|融合最大池化的Conformer中文语音识别

融合最大池化的Conformer中文语音识别

胡从刚 杨立鹏 孙永奇 陈华龙 韩可可

计算机工程2026,Vol.52Issue(1):105-115,11.
计算机工程2026,Vol.52Issue(1):105-115,11.DOI:10.19678/j.issn.1000-3428.0070055

融合最大池化的Conformer中文语音识别

Chinese Speech Recognition Using Conformer Fused with Max Pooling

胡从刚 1杨立鹏 2孙永奇 1陈华龙 2韩可可2

作者信息

  • 1. 北京交通大学先进轨道交通自主运行全国重点实验室,北京 100044||北京交通大学计算机科学与技术学院,北京 100044
  • 2. 中国铁道科学研究院集团有限公司,北京 100081
  • 折叠

摘要

Abstract

Speech recognition technology enables machines to understand human speech using advanced algorithms and signal processing technologies,thereby making communication between humans and machines more convenient.Most existing studies on end-to-end speech recognition focus on optimizing the Conformer model.The Conformer encoder suffers from the issue of insufficient extraction of fine-grained local speech features.To resolve these issues,this study proposes a Chinese speech recognition method based on Max Pooling(MP).First,the output of the gated linear unit in the convolutional module of the encoder is max-pooled along the time dimension to extract fine-grained local features corresponding to the characteristics of multiple speech signal frames.Second,these pooled features are fused with the coarse-grained local features extracted via Depthwise Convolution(DWC)using the element-wise sum method to increase the amount of information on local speech features and improve the speech recognition accuracy of the Conformer model.The experimental results on the public Chinese dataset Aishell-1 show that the improved model can reduce the Character Error Rate(CER)of the baseline model from 5.58%to 5.32%and from 5.06%to 4.92%by decoding using greedy search and attention rescoring,respectively.

关键词

语音识别/细粒度局部特征/Conformer模型/最大池化/逐通道卷积

Key words

speech recognition/fine-grained local feature/Conformer model/Max Pooling(MP)/Depthwise Convolution(DWC)

分类

信息技术与安全科学

引用本文复制引用

胡从刚,杨立鹏,孙永奇,陈华龙,韩可可..融合最大池化的Conformer中文语音识别[J].计算机工程,2026,52(1):105-115,11.

基金项目

中央高校基本科研业务费专项资金(2024JBGP008) (2024JBGP008)

新一代人工智能国家科技重大专项(2021ZD0113002). (2021ZD0113002)

计算机工程

1000-3428

访问量0
|
下载量0
段落导航相关论文