工矿自动化2026,Vol.52Issue(2):163-168,176,7.DOI:10.13272/j.issn.1671-251x.2026010005
基于掩码特征交叉预解网络的综采工作面语音分离方法
Mask feature cross pre-decoding network-based speech separation method for fully mechanized mining face
摘要
Abstract
The complex non-stationary mechanical noise in fully mechanized mining faces severely interferes with underground dispatch communication.Existing speech separation methods based on the Time-Domain Audio Separation Network(TasNet)architecture(encoder-mask network-decoder)tend to generate target speech masks that retain residual noise and interfering speech components.In addition,noise suppression may damage target speech features,resulting in reduced speech separation accuracy.To address this problem,a speech separation method for fully mechanized mining faces based on a mask feature cross pre-decoding network was proposed.The mask feature cross pre-decoding network was integrated after the mask network of TasNet and mainly consisted of a mask feature extraction module and a feature cross pre-decoding module.The mask feature extraction module learned noise-related features in different target speech masks through concatenation operations and a convolutional gating module,generated noise-related complementary weights,and used these weights to perform complementary weighting on the target speech masks to achieve noise filtering.The feature cross pre-decoding module performed cross-complementary fusion of features from different target speech masks,mined correlation information among the target speech masks,and then used a convolutional gating module and a residual enhancement module to purify and compensate the masks,avoiding weak speech from being masked and protecting target speech that may be damaged during the noise suppression.Experimental results showed that,compared with mainstream TasNet-based speech separation methods such as Convolutional Time-Domain Audio Separation Network(Conv-TasNet),Dual-Path Recurrent Neural Network(DPRNN),Dual-Path Transformer Network(DPTNet),and Globally Attentive Locally Recurrent Network(GALR),the proposed method improved the Scale-Invariant Signal-to-Noise Ratio Improvement(SI-SNRi)by 3.52,1.74,1.40,and 2.09 dB,and improved the Signal-to-Distortion Ratio Improvement(SDRi)by 3.21,1.45,1.14,and 1.80 dB,respectively,and had fewer parameters.The proposed method can be deployed on embedded chips with built-in Neural Network Processing Units(NPUs).The module is compact and requires low computational cost,meeting the engineering application requirements for miniaturization and low power consumption of underground voice terminals.关键词
语音分离/综采工作面/掩码特征交叉预解网络/掩码特征提取/噪声抑制/调度通信Key words
speech separation/fully mechanized mining face/mask feature cross pre-decoding network/mask feature extraction/noise suppression/dispatch communication分类
矿业与冶金引用本文复制引用
王科平,姚凯濠,杨艺,钱伟,王田..基于掩码特征交叉预解网络的综采工作面语音分离方法[J].工矿自动化,2026,52(2):163-168,176,7.基金项目
国家自然科学基金项目(92467108). (92467108)