电讯技术2025,Vol.65Issue(2):254-260,7.DOI:10.20079/j.issn.1001-893x.240414002
基于EfficientNetV2-RetNet的端到端中文管制语音识别
End-to-End Mandarin Speech Recognition for Air Traffic Control Utilizing EfficientNetV2-RetNet
摘要
Abstract
The utilization of automatic speech recognition(ASR)technology in the air traffic control(ATC)field shows potential in enhancing communication efficiency,minimizing human errors,improving safety,and promoting innovation and advancement in air traffic management(ATM)systems.However,acquiring a substantial size of labeled ATC speech data is challenging due to the sensitive nature of ATC communications,which presents significant obstacles in the development of highly accurate ASR systems.In this paper,a novel end-to-end ASR framework,EfficientNetV2-RetNet-CTC,is developed for ATC systems by utilizing retentive network(RetNet)and transfer learning techniques.The multi-layer convolutional architecture of EfficientNetV2 enables the extraction of intricate feature representations from speech signals.RetNet utilizes a multi-scale retention mechanism for effectively capturing global temporal dynamics in sequence data,thereby enhancing its ability to manage long-distance dependencies efficiently.Connectionist temporal classification(CTC)obviates the necessity for forced alignment of labels and can handle labels of variable lengths.Moreover,transfer learning enhances the performance of the target task by capitalizing on the knowledge acquired from the source task.This approach helps overcome the limited availability of data resources in the civil aviation domain and boosts the model's capacity for generalization.Experimental results indicate that the developed model surpasses alternative baselines.It achieves a minimum character error rate of 7.6%and 8.7%when pre-trained on the Aishell corpus,which is further reduced to 5.6%and 6.8%on the ATC corpus.关键词
空中交通管制/自动语音识别/端到端深度学习/迁移学习Key words
air traffic control/automatic speech recognition/end-to-end deep learning/transfer learning分类
航空航天引用本文复制引用
梁海军,常瀚文,何一民,赵志伟,孔建国..基于EfficientNetV2-RetNet的端到端中文管制语音识别[J].电讯技术,2025,65(2):254-260,7.基金项目
国家重点研发计划(2021YFF0603904) (2021YFF0603904)
中央高校基本科研业务费专项资金资助(PHD2023-035) (PHD2023-035)
中央高校基本科研业务费资助项目(24CAFUC10195) (24CAFUC10195)