南京邮电大学学报(自然科学版)2024,Vol.44Issue(6):44-52,9.DOI:10.14132/j.cnki.1673-5439.2024.06.005
噪声环境下基于注意力的时域语音分离方法
An attention-based time-domain speech separation method in noisy environments
摘要
Abstract
Deep learning-based time-domain single-channel speech separation models have achieved significant success in noise-free scenarios.However,they tend to mistakenly encode noise features as source speech features in noisy environments,which affects the accuracy of mask estimation and results in suboptimal separation performance.To deal with this problem,we propose a time-domain speech separation model based on attention mechanisms to mitigate the negative impact of noise on separation performance.First,given the disparate importance of channels in the output features from the temporal encoder,we introduce an efficient channel attention(EC A)module embedded within the encoder to perform weighted processing on the channel-wise features.Second,we adopt a graph attention network(GAT)to compute attention coefficients between adjacent frames for the aggregation of encoded features from neighboring frames,thus the influence of noise on mask estimation can be reduced.Experimental results on the WHAM!,Libri2Mix-Noisy,and Libri3 Mix-Noisy datasets demonstrate that the proposed GAT-ECA-based DPRNN(GACA-DPRNN)outperforms the DPRNN baseline in terms of scale invariant signal-to-noise ratio improvement(SI-SNRi)and signal distortion ratio improvement(SDRi).关键词
语音分离/通道注意力/图神经网络/图注意力网络Key words
speech separation/channel attention/graph neural network/graph attention network(GAT)分类
信息技术与安全科学引用本文复制引用
余传旗,王婷婷,郭海燕,杨震..噪声环境下基于注意力的时域语音分离方法[J].南京邮电大学学报(自然科学版),2024,44(6):44-52,9.基金项目
国家自然科学基金(62071242)资助项目 (62071242)