| 注册
首页|期刊导航|电讯技术|基于EfficientNetV2-RetNet的端到端中文管制语音识别

基于EfficientNetV2-RetNet的端到端中文管制语音识别

梁海军 常瀚文 何一民 赵志伟 孔建国

电讯技术2025,Vol.65Issue(2):254-260,7.
电讯技术2025,Vol.65Issue(2):254-260,7.DOI:10.20079/j.issn.1001-893x.240414002

基于EfficientNetV2-RetNet的端到端中文管制语音识别

End-to-End Mandarin Speech Recognition for Air Traffic Control Utilizing EfficientNetV2-RetNet

梁海军 1常瀚文 1何一民 1赵志伟 1孔建国1

作者信息

  • 1. 中国民用航空飞行学院 空中交通管理学院,四川 广汉 618307
  • 折叠

摘要

Abstract

The utilization of automatic speech recognition(ASR)technology in the air traffic control(ATC)field shows potential in enhancing communication efficiency,minimizing human errors,improving safety,and promoting innovation and advancement in air traffic management(ATM)systems.However,acquiring a substantial size of labeled ATC speech data is challenging due to the sensitive nature of ATC communications,which presents significant obstacles in the development of highly accurate ASR systems.In this paper,a novel end-to-end ASR framework,EfficientNetV2-RetNet-CTC,is developed for ATC systems by utilizing retentive network(RetNet)and transfer learning techniques.The multi-layer convolutional architecture of EfficientNetV2 enables the extraction of intricate feature representations from speech signals.RetNet utilizes a multi-scale retention mechanism for effectively capturing global temporal dynamics in sequence data,thereby enhancing its ability to manage long-distance dependencies efficiently.Connectionist temporal classification(CTC)obviates the necessity for forced alignment of labels and can handle labels of variable lengths.Moreover,transfer learning enhances the performance of the target task by capitalizing on the knowledge acquired from the source task.This approach helps overcome the limited availability of data resources in the civil aviation domain and boosts the model's capacity for generalization.Experimental results indicate that the developed model surpasses alternative baselines.It achieves a minimum character error rate of 7.6%and 8.7%when pre-trained on the Aishell corpus,which is further reduced to 5.6%and 6.8%on the ATC corpus.

关键词

空中交通管制/自动语音识别/端到端深度学习/迁移学习

Key words

air traffic control/automatic speech recognition/end-to-end deep learning/transfer learning

分类

航空航天

引用本文复制引用

梁海军,常瀚文,何一民,赵志伟,孔建国..基于EfficientNetV2-RetNet的端到端中文管制语音识别[J].电讯技术,2025,65(2):254-260,7.

基金项目

国家重点研发计划(2021YFF0603904) (2021YFF0603904)

中央高校基本科研业务费专项资金资助(PHD2023-035) (PHD2023-035)

中央高校基本科研业务费资助项目(24CAFUC10195) (24CAFUC10195)

电讯技术

OA北大核心

1001-893X

访问量0
|
下载量0
段落导航相关论文