| 注册
首页|期刊导航|计算机技术与发展|基于改进Transformer模型的多声源分离方法

基于改进Transformer模型的多声源分离方法

曾援 李剑 马明星 庞润嘉 贺斌

计算机技术与发展2024,Vol.34Issue(5):60-65,6.
计算机技术与发展2024,Vol.34Issue(5):60-65,6.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0041

基于改进Transformer模型的多声源分离方法

Multi-source Separation Method Based on Improved Transformer Model

曾援 1李剑 1马明星 1庞润嘉 1贺斌1

作者信息

  • 1. 中北大学 信息与通信工程学院,山西 太原 030051||中北大学 省部共建动态测试技术国家重点实验室,山西 太原 030051
  • 折叠

摘要

Abstract

The current mainstream speech separation algorithm models are all based on complex recursive network or Transformer network.The high complexity of Transformer network leads to difficult training,and the high sampling rate of audio leads to the use of long input at the sample level to obtain incomplete features.The feature loss problem occurs when long speech feature sequences cannot be directly modeled.For this,we propose an improved network model based on Transformer.Firstly,a new subsample block is added to the existing Transformer network model encoder to calculate advanced features on different time scales and reduce feature space complexity.Secondly,feature fusion between the upper sampling layer and the lower sampling layer of the encoder is added to the decoder of the Transformer network model to ensure no feature loss and improve model separation capability.Finally,an improved sliding window attention mechanism is introduced in the model separation layer.The sliding window uses circular shift technology,and the new feature window contains part of the old feature window and feature edge information to complete the information interaction between feature Windows,obtain feature coding and feature position coding,and improve the correlation coefficient between feature infor-mation.The experiment shows that the separation effect is better than that of the previous method,with SI-SNR evaluation standard reaching13.5 dB and SDR evaluation index reaching14.1 dB.

关键词

上下采样层/Transformer/特征编码/滑动窗口注意力机制/深度学习

Key words

upper and lower sampling layer/Transformer/feature coding/sliding window attention mechanism/deep learning

分类

信息技术与安全科学

引用本文复制引用

曾援,李剑,马明星,庞润嘉,贺斌..基于改进Transformer模型的多声源分离方法[J].计算机技术与发展,2024,34(5):60-65,6.

基金项目

国家自然基金青年科学基金(61901419) (61901419)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文