首页|期刊导航|工矿自动化|基于掩码特征交叉预解网络的综采工作面语音分离方法

基于掩码特征交叉预解网络的综采工作面语音分离方法

王科平姚凯濠杨艺钱伟王田

工矿自动化2026，Vol.52Issue(2)：163-168,176,7.

工矿自动化2026，Vol.52Issue(2)：163-168,176,7.DOI:10.13272/j.issn.1671-251x.2026010005

基于掩码特征交叉预解网络的综采工作面语音分离方法

Mask feature cross pre-decoding network-based speech separation method for fully mechanized mining face

王科平 ¹姚凯濠 ¹杨艺 ¹钱伟 ¹王田²

作者信息

1. 河南理工大学电气工程与自动化学院,河南焦作 454003||河南理工大学河南省煤矿设备智能检测与控制重点实验室,河南焦作 454003
2. 北京航空航天大学人工智能学院,北京 100191
折叠

摘要

Abstract

The complex non-stationary mechanical noise in fully mechanized mining faces severely interferes with underground dispatch communication.Existing speech separation methods based on the Time-Domain Audio Separation Network(TasNet)architecture(encoder-mask network-decoder)tend to generate target speech masks that retain residual noise and interfering speech components.In addition,noise suppression may damage target speech features,resulting in reduced speech separation accuracy.To address this problem,a speech separation method for fully mechanized mining faces based on a mask feature cross pre-decoding network was proposed.The mask feature cross pre-decoding network was integrated after the mask network of TasNet and mainly consisted of a mask feature extraction module and a feature cross pre-decoding module.The mask feature extraction module learned noise-related features in different target speech masks through concatenation operations and a convolutional gating module,generated noise-related complementary weights,and used these weights to perform complementary weighting on the target speech masks to achieve noise filtering.The feature cross pre-decoding module performed cross-complementary fusion of features from different target speech masks,mined correlation information among the target speech masks,and then used a convolutional gating module and a residual enhancement module to purify and compensate the masks,avoiding weak speech from being masked and protecting target speech that may be damaged during the noise suppression.Experimental results showed that,compared with mainstream TasNet-based speech separation methods such as Convolutional Time-Domain Audio Separation Network(Conv-TasNet),Dual-Path Recurrent Neural Network(DPRNN),Dual-Path Transformer Network(DPTNet),and Globally Attentive Locally Recurrent Network(GALR),the proposed method improved the Scale-Invariant Signal-to-Noise Ratio Improvement(SI-SNRi)by 3.52,1.74,1.40,and 2.09 dB,and improved the Signal-to-Distortion Ratio Improvement(SDRi)by 3.21,1.45,1.14,and 1.80 dB,respectively,and had fewer parameters.The proposed method can be deployed on embedded chips with built-in Neural Network Processing Units(NPUs).The module is compact and requires low computational cost,meeting the engineering application requirements for miniaturization and low power consumption of underground voice terminals.

关键词

语音分离/综采工作面/掩码特征交叉预解网络/掩码特征提取/噪声抑制/调度通信

Key words

speech separation/fully mechanized mining face/mask feature cross pre-decoding network/mask feature extraction/noise suppression/dispatch communication

分类

矿业与冶金

引用本文复制引用

王科平,姚凯濠,杨艺,钱伟,王田..基于掩码特征交叉预解网络的综采工作面语音分离方法[J].工矿自动化,2026,52(2):163-168,176,7.

基金项目

国家自然科学基金项目(92467108). （92467108）

工矿自动化

ISSN：1671-251X

访问量0

下载量0

段落导航