首页|期刊导航|计算机应用研究|基于双分支注意力U-Net的语音增强方法

基于双分支注意力U-Net的语音增强方法

曹洁王宸章梁浩鹏王乔李晓旭

计算机应用研究2024，Vol.41Issue(4)：1112-1116,5.

计算机应用研究2024，Vol.41Issue(4)：1112-1116,5.DOI:10.19734/j.issn.1001-3695.2023.09.0374

基于双分支注意力U-Net的语音增强方法

Speech enhancement method based on two-branch attention and U-Net

曹洁 ¹王宸章 ²梁浩鹏 ²王乔 ²李晓旭²

作者信息

1. 兰州理工大学计算机与通信学院,兰州 730050||兰州城市学院信息工程学院,兰州 730050
2. 兰州理工大学计算机与通信学院,兰州 730050
折叠

摘要

Abstract

Aiming at the problem that speech enhancement networks have difficulty in extracting global speech-related features and are ineffective in capturing local contextual information of speech.This paper proposed a two-branch attention and U-Net-based time-domain speech enhancement method,which used a U-Net encoder-decoder structure and took the high-dimensional time-domain features obtained from a single-channel noisy speech after one-dimensional convolution as input.Firstly,this pa-per designed Conformer-based residual convolution to enhance the noise reduction ability of network by utilizing residual con-nection.Secondly,this paper designed a two-branch attention mechanism structure,which utilized global and local attention to obtain richer contextual information in the noisy speech,and at the same time,to effectively represent the long sequence fea-tures and extract more diverse feature information.Finally,this paper constructed a weighted loss function by combining the loss function in the time domain and frequency domain to train the network and improve the performance in speech enhance-ment.This paper used several metrics to evaluate the quality and intelligibility of the enhanced speech,the enhanced speech perceptual evaluation of speech quality(PESQ)on the public datasets Voice Bank+DEMAND is 3.11,the short-time objec-tive intelligibility(STOI)is 95％,the composite measure for predicting signal rating(CSIG)is 4.44,the composite measure for predicting background noise(CBAK)is 3.60,and the composite measure for predicting overall processed speech quality(COVL)is 3.81,in which the PESQ is improved by 7.6％compared to SE-Conformer,and improved by 5.1％compared to TSTNN improved by 5.1％.Experimental results show that the proposed method achieves better results in various metrics of speech denoising and meets the requirements for speech enhancement tasks.

关键词

语音增强/双分支注意力机制/时域/单通道

Key words

speech enhancement/two-branch attention/time domain/single channel

分类

信息技术与安全科学

引用本文复制引用

曹洁,王宸章,梁浩鹏,王乔,李晓旭..基于双分支注意力U-Net的语音增强方法[J].计算机应用研究,2024,41(4):1112-1116,5.

基金项目

甘肃省重点研发计划资助项目(22YF7GA130) （22YF7GA130）

计算机应用研究

OA北大核心CSTPCD

ISSN：1001-3695

访问量0

下载量0

段落导航