基于幅值掩膜时频神经网络的语音频带扩展OA北大核心CSTPCD
Time frequency neural network based on amplitude mask for speech bandwidth extension
为了提高基于深度学习的语音频带扩展性能,提出一种结合幅值掩膜的时频神经网络模型.该模型既能利用语音的相位信息,又能通过幅值掩膜来优化预测语音的幅值.模型时域部分设计一种融合注意力机制的长短时记忆神经网络,该网络可以实现并行计算,当预测高频语音时充分利用距离相近的前后语音帧之间的关系,舍弃对远距离语音帧之间关系的学习,从而减少模型的计算量.主客观实验表明该方法在信噪比和可懂度等度量上优于传统方法和基于深度神经网络的语音频带扩展方法.
To improve the performance of speech bandwidth extension based on deep learning,a time-frequency neural network model combined with amplitude mask was proposed.This model could not only exploit the phase information of speech,but also optimize the predicted speech amplitude through amplitude mask.In the time domain part of the model,a long short-term memory neural network integrating attention mechanism was designed.This network could realize parallel computing,and when predicting high-frequency speech,it could make full use of the relationship between the front and back speech frames with similar distance,and discard the learning of the relationship between the distant speech frames,thus reducing the calculation amount of the model.Subjective and objective experiments show that the method is superior to the traditional methods and the deep neural network based speech bandwidth extension methods in terms of signal to noise ratio and intelligibility.
许春冬;谭国武;应冬文
江西理工大学信息工程学院,江西 赣州 341000江西理工大学信息工程学院,江西 赣州 341000||中国科学院大学电子电气与通信工程学院,北京 100049
电子信息工程
语音频带扩展时频神经网络长短时记忆神经网络幅值掩膜注意力机制
speech bandwidth extensiontime frequency neural networklong short-term memory networkamplitude maskattention mechanism
《华中科技大学学报(自然科学版)》 2024 (006)
179-184 / 6
国家自然科学基金资助项目(11864016,11704164);江西省科技厅重点研发计划一般项目(20202BBEL53006).
评论