计算机与现代化Issue(9):20-24,5.DOI:10.3969/j.issn.1006-2475.2024.09.004
基于时频自注意力残差时序卷积网络的语音增强
Speech Enhancement Based on Time-frequency Self-attention Residual Temporal Convolutional Networks
摘要
Abstract
The main purpose of speech enhancement(SE)is to remove irrelevant signals such as noise.It is the front-end pro-cessing part of many speech processing tasks.SE plays an important role in fields such as video conferencing and live broadcast-ing.However,most studies on SE mainly focuses on the long-term context-dependent modeling of speech frames,without con-sidering the energy distribution characteristics in the time-frequency domain.This paper proposes a self-attention module based on time-frequency domain,which makes it possible to explicitly introduce a priori thinking about speech distribution characteris-tics in the process of model modeling.Combined with the residual temporal convolutional network,a residual temporal convolu-tional network model based on time-frequency domain self-attention is constructed.In order to verify the validity of the model,two training targets,IRM and PSM,which are commonly used in the field of SE,are used for experiments.The experimental re-sults show that the model significantly improves the performance in terms of four objective evaluation metrics in SE and is consis-tently better than other baseline models.关键词
语音增强/时频域/自注意力机制/时序卷积网络Key words
speech enhancement/time-frequency/self-attention mechanism/temporal convolutional network分类
信息技术与安全科学引用本文复制引用
候聪颖,杨文清,王召,程聪..基于时频自注意力残差时序卷积网络的语音增强[J].计算机与现代化,2024,(9):20-24,5.基金项目
国电南瑞南京控制系统有限公司项目(524609230006) (524609230006)