现代电子技术2025,Vol.48Issue(24):61-66,6.DOI:10.16652/j.issn.1004-373x.2025.24.010
基于动态时间规整与Transformer的连续语音识别与发音校正算法
Continuous speech recognition and pronunciation correction algorithm based on DTW and Transformer
摘要
Abstract
In allusion to the limitations of traditional dynamic time warping(DTW)algorithms in large-scale speech processing,such as low efficiency,insufficient robustness for non-specific person recognition,and the poor accuracy of Transformer models in short-term speech alignment,a continuous speech recognition and pronunciation correction algorithm based on DTW-Transformer fusion is proposed.This algorithm can realize the precise temporal alignment of short-term speech frames by means of DTW,capture the global dependencies of long-term speech sequences by means of the multi-head attention mechanism of Transformer,and construct a two-layer processing architecture of"local alignment-global modeling".The experimental results on the public speech dataset TIMIT and proprietary speech learning pronunciation dataset reveal that the word error rate(WER)of the proposed algorithm in continuous speech recognition is 18.9%lower than that of the traditional DTW algorithm and 5.7%lower than that of the single Transformer model.The phoneme error detection rate for pronunciation correction can reach 95.3%,and the real-time response delay is controlled within 280 ms,which can meet the application requirements of scenarios such as language education and intelligent evaluation.关键词
连续语音识别/发音校正/动态时间规整/Transformer/时序对齐/注意力机制Key words
continuous speech recognition/pronunciation correction/dynamic time warping/Transformer/temporal alignment/attention mechanism分类
信息技术与安全科学引用本文复制引用
潘桂妹..基于动态时间规整与Transformer的连续语音识别与发音校正算法[J].现代电子技术,2025,48(24):61-66,6.基金项目
广东省教育厅项目(粤教高函[2023]4号-1097) (粤教高函[2023]4号-1097)
中国民办教育协会2025年度规划课题(青年课题)(CANQN250851) (青年课题)
湛江市哲学社会科学2025年度规划项目(ZJ25YB47) (ZJ25YB47)