信号处理2025,Vol.41Issue(9):1513-1524,12.DOI:10.12466/xhcl.2025.09.005
跨模态双向注意力的视听双主导语音增强方法
Audio-Visual Dual-Dominant Speech Enhancement Method with Cross-Modal Bidirectional Attention
摘要
Abstract
To address the issue of audio modality dominance and underutilization of video modality assistance in audiovi-sual multimodal speech enhancement,this paper proposes an audio-visual dual-dominant-branch cooperative enhance-ment encoder-decoder architecture.At the encoding stage,the video-dominant branch employs random-dimensional au-dio masking to simulate audio feature deficiencies under low signal-to-noise ratio(SNR)conditions,using video fea-tures to guide the prediction and reconstruction of missing audio features,thereby enhancing the auxiliary effectiveness of the video modality.The intermediate layer adopts a cross-modal bidirectional cross-attention mechanism to model dy-namic complementary relationships between audio and visual modalities.The decoding layer integrates dual-branch fea-tures through learnable dynamic weighting factors to achieve efficient cross-modal fusion.Experimental validation on the GRID dataset demonstrates that the proposed method significantly improves speech enhancement performance in low-SNR scenarios,achieving improvements of 0.123~0.156 in Perceptual Evaluation of Speech Quality(PESQ)and 1.78%~2.21%in Short-Time Objective Intelligibility(STOI),outperforming mainstream models in objective evalua-tions.Ablation studies further confirm the effectiveness of the bidirectional attention architecture and the video-guided masking mechanism,demonstrating that this approach breaks away from the traditional single-modality-dominant inter-action paradigm.This enables collaborative cross-modal feature enhancement and robust representation learning.关键词
视听语音增强/特征融合/掩码预测/交叉注意力Key words
audio-visual speech enhancement/feature fusion/mask prediction/cross attention分类
信息技术与安全科学引用本文复制引用
郭飞扬,张天骐,沈夕文,高逸飞..跨模态双向注意力的视听双主导语音增强方法[J].信号处理,2025,41(9):1513-1524,12.基金项目
国家自然科学基金(61671095,61371164,61071196) (61671095,61371164,61071196)
重庆市自然基金项目(cstc2021jcyj-msxmX0836) The National Natural Science Foundation of China(61671095,61371164,61071196) (cstc2021jcyj-msxmX0836)
Natural Science Foundation of Chongqing(cstc2021jcyj-msxmX0836) (cstc2021jcyj-msxmX0836)