基于时频自注意力残差时序卷积网络的语音增强OACSTPCD
Speech Enhancement Based on Time-frequency Self-attention Residual Temporal Convolutional Networks
语音增强的主要目的是去除语音信号中的噪声等无关信号,是许多语音处理任务的前端处理部分,在视频会议、视频直播等领域都有着重要的作用.然而目前大多数语音增强的研究主要集中在语音帧的长期上下文依赖关系建模上,没有考虑语音在时频域上的能量分布特征.本文提出一种基于时频域的自注意力模块,使得在模型建模过程中可以显式引入对语音分布特性的先验思考,并与残差时序卷积网络相结合,构成基于时频域自注意力的残差时序卷积网络模型.为了验证该模型的有效性,本文使用语音增强领域中常用的2个训练目标IRM和PSM进行实验,实验结果表明,该模型显著提高了语音增强领域中4种常用的客观评价指标,明显优于其他基准模型.
The main purpose of speech enhancement(SE)is to remove irrelevant signals such as noise.It is the front-end pro-cessing part of many speech processing tasks.SE plays an important role in fields such as video conferencing and live broadcast-ing.However,most studies on SE mainly focuses on the long-term context-dependent modeling of speech frames,without con-sidering the energy distribution characteristics in the time-frequency domain.This paper proposes a self-attention module based on time-frequency domain,which makes it possible to explicitly introduce a priori thinking about speech distribution characteris-tics in the process of model modeling.Combined with the residual temporal convolutional network,a residual temporal convolu-tional network model based on time-frequency domain self-attention is constructed.In order to verify the validity of the model,two training targets,IRM and PSM,which are commonly used in the field of SE,are used for experiments.The experimental re-sults show that the model significantly improves the performance in terms of four objective evaluation metrics in SE and is consis-tently better than other baseline models.
候聪颖;杨文清;王召;程聪
国电南瑞科技股份有限公司,江苏 南京 211000
计算机与自动化
语音增强时频域自注意力机制时序卷积网络
speech enhancementtime-frequencyself-attention mechanismtemporal convolutional network
《计算机与现代化》 2024 (009)
20-24 / 5
国电南瑞南京控制系统有限公司项目(524609230006)
评论