首页|期刊导航|南京邮电大学学报（自然科学版）|噪声环境下基于注意力的时域语音分离方法

噪声环境下基于注意力的时域语音分离方法

余传旗王婷婷郭海燕杨震

南京邮电大学学报（自然科学版）2024，Vol.44Issue(6)：44-52,9.

南京邮电大学学报（自然科学版）2024，Vol.44Issue(6)：44-52,9.DOI:10.14132/j.cnki.1673-5439.2024.06.005

噪声环境下基于注意力的时域语音分离方法

An attention-based time-domain speech separation method in noisy environments

余传旗 ¹王婷婷 ¹郭海燕 ¹杨震²

作者信息

1. 南京邮电大学通信与信息工程学院,江苏南京 210003
2. 南京邮电大学通信与信息工程学院,江苏南京 210003||南京邮电大学通信与网络技术国家地方联合工程研究中心,江苏南京 210003
折叠

摘要

Abstract

Deep learning-based time-domain single-channel speech separation models have achieved significant success in noise-free scenarios.However,they tend to mistakenly encode noise features as source speech features in noisy environments,which affects the accuracy of mask estimation and results in suboptimal separation performance.To deal with this problem,we propose a time-domain speech separation model based on attention mechanisms to mitigate the negative impact of noise on separation performance.First,given the disparate importance of channels in the output features from the temporal encoder,we introduce an efficient channel attention(EC A)module embedded within the encoder to perform weighted processing on the channel-wise features.Second,we adopt a graph attention network(GAT)to compute attention coefficients between adjacent frames for the aggregation of encoded features from neighboring frames,thus the influence of noise on mask estimation can be reduced.Experimental results on the WHAM!,Libri2Mix-Noisy,and Libri3 Mix-Noisy datasets demonstrate that the proposed GAT-ECA-based DPRNN(GACA-DPRNN)outperforms the DPRNN baseline in terms of scale invariant signal-to-noise ratio improvement(SI-SNRi)and signal distortion ratio improvement(SDRi).

关键词

语音分离/通道注意力/图神经网络/图注意力网络

Key words

speech separation/channel attention/graph neural network/graph attention network(GAT)

分类

信息技术与安全科学

引用本文复制引用

余传旗,王婷婷,郭海燕,杨震..噪声环境下基于注意力的时域语音分离方法[J].南京邮电大学学报（自然科学版）,2024,44(6):44-52,9.

基金项目

国家自然科学基金(62071242)资助项目（62071242）

南京邮电大学学报（自然科学版）

OA北大核心CSTPCD

ISSN：1673-5439

访问量0

下载量0

段落导航