| 注册
首页|期刊导航|通信学报|低信噪比下多级特征深度融合的视听语音增强

低信噪比下多级特征深度融合的视听语音增强

张天骐 沈夕文 唐娟 谭霜

通信学报2025,Vol.46Issue(5):133-144,12.
通信学报2025,Vol.46Issue(5):133-144,12.DOI:10.11959/j.issn.1000-436x.2025075

低信噪比下多级特征深度融合的视听语音增强

Audio-visual speech enhancement with multi-level feature deep fusion under low signal-to-noise ratio

张天骐 1沈夕文 1唐娟 1谭霜1

作者信息

  • 1. 重庆邮电大学通信与信息工程学院,重庆 400065
  • 折叠

摘要

Abstract

To address the limitations in feature extraction and cross-modal fusion in audio-visual speech enhancement,a multistage deep fusion method was proposed for low signal-to-noise ratio(SNR)conditions.The method consisted of an audio-visual encoding network,a fusion network,and an auditory decoding network.A multi-branch collaborative unit(MCU)was introduced in the auditory encoder,along with an audio-visual attention fusion module(AVAFM)between each visual and auditory layer.A fusion weighting block(FWB)was also designed to optimize and dynamically weight features at each stage.Experiments on TMSV and LGRID datasets showed that the proposed method significantly im-proved PESQ and STOI scores under various low-SNR conditions.Compared to audio-only enhancement,average gains of 38.95%in PESQ and 33.92%in STOI were achieved at-5 dB,-2 dB,and 1 dB.These results demonstrate the method's strong denoising ability and the effectiveness of visual information.

关键词

视听语音增强/低信噪比/多级特征融合/融合加权/视听注意力

Key words

audio-visual speech enhancement/low signal-to-noise ratio/multi-level feature fusion/fusion weighted/audio-visual attention

分类

计算机与自动化

引用本文复制引用

张天骐,沈夕文,唐娟,谭霜..低信噪比下多级特征深度融合的视听语音增强[J].通信学报,2025,46(5):133-144,12.

基金项目

重庆市自然科学基金资助项目(No.cstc2021jcyj-msxmX0836)Foundation Item:The Natural Science Foundation of Chongqing(No.cstc2021jcyj-msxmX0836) (No.cstc2021jcyj-msxmX0836)

通信学报

OA北大核心

1000-436X

访问量0
|
下载量0
段落导航相关论文