| 注册
首页|期刊导航|信号处理|基于卷积重参数化的轻量级多通道语音增强

基于卷积重参数化的轻量级多通道语音增强

高九鹏 孙天驰 陈锴 卢晶

信号处理2025,Vol.41Issue(12):1967-1979,13.
信号处理2025,Vol.41Issue(12):1967-1979,13.DOI:10.12466/xhcl.2025.12.009

基于卷积重参数化的轻量级多通道语音增强

Lightweight Multichannel Speech Enhancement Based on Reparameterized Convolution

高九鹏 1孙天驰 1陈锴 1卢晶1

作者信息

  • 1. 南京大学物理学院声科学与工程系,江苏 南京 210093
  • 折叠

摘要

Abstract

Multichannel speech enhancement leverages the spatial perception of microphone arrays to extract high-quality target speech from noisy mixtures,thereby serving as a critical preprocessing stage for automatic speech recogni-tion,teleconferencing,and assistive hearing.Although deep neural approaches currently dominate—ranging from hy-brids that couple learning with classical spatial filtering to fully neural beamforming—their deployment on edge devices remains difficult.Models must simultaneously satisfy strict real-time causality,tight compute and memory budgets,and high accuracy under low signal-to-noise ratio and nonstationary,spatially complex noise.Existing lightweight solutions often fall short of this triad,and methods that stay below a few hundred multiple model adaptive controls per second(MMACs/s)while remaining competitive at low SNR are rare.To address these limitations,we propose a multi-branch causal network(MBCNet),which has a deployment-oriented,lightweight multichannel architecture built around convo-lutional reparameterization.MBCNet jointly encodes auditory features,complex spectral representations,and spatial cues.Its backbone comprises three parts:(ⅰ)a parallel feature encoder that aligns and fuses the three streams;(ⅱ)a deep extractor with symmetric encoder-decoder and multilevel frequency downsampling-upsampling blocks to expand the effective frequency receptive field;and(ⅲ)a mask estimation head that predicts multichannel complex filters for enhanced signal reconstruction.Self-attention components are integrated where beneficial to capture the long-range de-pendencies without violating causality.The first key contribution is the reparameterizable multibranch convolution(RepMBConv).During training,RepMBConv uses five coordinated branches—temporal,spectral,joint time-frequency,refinement,and identity—to enrich feature diversity and learn complementary inductive biases.At infer-ence,the branches are analytically fused into a single convolutional kernel through linear equivalence,incurring zero ex-tra computational overhead.Branch-importance analysis further reveals a hierarchical learning behavior,whereby shal-low stages emphasize local refinement,whereas deeper stages prioritize temporal and spectral abstractions.We exploited this property after convergence to add,prune,and fine-tune branches,reallocating capacity to critical channels and scales to yield measurable gains without increasing complexity.The second contribution is a frequency downsampling-upsampling module that replaces conventional pairs of convolution and transpose convolution.Downsampling is realized by frequency-index splitting,channel stacking,and convolution,with upsampling reversing this process via channel separation,frequency-index recombination,and convolution.This design doubles the frequency receptive field without increasing computational cost,improves broadband noise suppression,and avoids the artifacts associated with deconvo-lution,all while preserving streaming causality.Ablation studies confirm RepMBConv's superiority over standard and dilated convolutions under matched complexity,demonstrating that removing spatial or complex-domain features de-grades performance.In comparative experiments,MBCNet achieves superior or comparable denoising performance with fewer parameters and lower computational cost,validating its effectiveness and deployment potential on edge devices.

关键词

多通道语音增强/轻量级/卷积重参数化

Key words

multichannel speech enhancement/lightweight design/reparameterizable convolution

分类

信息技术与安全科学

引用本文复制引用

高九鹏,孙天驰,陈锴,卢晶..基于卷积重参数化的轻量级多通道语音增强[J].信号处理,2025,41(12):1967-1979,13.

基金项目

国家自然科学基金面上项目(12274221) The National Natural Science Foundation of China(12274221) (12274221)

信号处理

OA北大核心

1003-0530

访问量0
|
下载量0
段落导航相关论文