计算机应用研究2024,Vol.41Issue(4):1112-1116,5.DOI:10.19734/j.issn.1001-3695.2023.09.0374
基于双分支注意力U-Net的语音增强方法
Speech enhancement method based on two-branch attention and U-Net
摘要
Abstract
Aiming at the problem that speech enhancement networks have difficulty in extracting global speech-related features and are ineffective in capturing local contextual information of speech.This paper proposed a two-branch attention and U-Net-based time-domain speech enhancement method,which used a U-Net encoder-decoder structure and took the high-dimensional time-domain features obtained from a single-channel noisy speech after one-dimensional convolution as input.Firstly,this pa-per designed Conformer-based residual convolution to enhance the noise reduction ability of network by utilizing residual con-nection.Secondly,this paper designed a two-branch attention mechanism structure,which utilized global and local attention to obtain richer contextual information in the noisy speech,and at the same time,to effectively represent the long sequence fea-tures and extract more diverse feature information.Finally,this paper constructed a weighted loss function by combining the loss function in the time domain and frequency domain to train the network and improve the performance in speech enhance-ment.This paper used several metrics to evaluate the quality and intelligibility of the enhanced speech,the enhanced speech perceptual evaluation of speech quality(PESQ)on the public datasets Voice Bank+DEMAND is 3.11,the short-time objec-tive intelligibility(STOI)is 95%,the composite measure for predicting signal rating(CSIG)is 4.44,the composite measure for predicting background noise(CBAK)is 3.60,and the composite measure for predicting overall processed speech quality(COVL)is 3.81,in which the PESQ is improved by 7.6%compared to SE-Conformer,and improved by 5.1%compared to TSTNN improved by 5.1%.Experimental results show that the proposed method achieves better results in various metrics of speech denoising and meets the requirements for speech enhancement tasks.关键词
语音增强/双分支注意力机制/时域/单通道Key words
speech enhancement/two-branch attention/time domain/single channel分类
信息技术与安全科学引用本文复制引用
曹洁,王宸章,梁浩鹏,王乔,李晓旭..基于双分支注意力U-Net的语音增强方法[J].计算机应用研究,2024,41(4):1112-1116,5.基金项目
甘肃省重点研发计划资助项目(22YF7GA130) (22YF7GA130)