| 注册
首页|期刊导航|计算机应用研究|多维度交叉注意力融合的视听分割网络

多维度交叉注意力融合的视听分割网络

李凡凡 张垣垣 章永龙 朱俊武

计算机应用研究2025,Vol.42Issue(6):1656-1661,6.
计算机应用研究2025,Vol.42Issue(6):1656-1661,6.DOI:10.19734/j.issn.1001-3695.2024.08.0369

多维度交叉注意力融合的视听分割网络

Audio-visual segmentation network with multi-dimensional cross-attention fusion

李凡凡 1张垣垣 1章永龙 1朱俊武1

作者信息

  • 1. 扬州大学信息工程学院,江苏扬州 225100
  • 折叠

摘要

Abstract

Audio-visual segmentation(AVS)aims to locate and accurately segment the sounding objects in images based on both visual and auditory information.While most existing research focuses primarily on exploring methods for audio-visual in-formation fusion,there is insufficient in-depth exploration of fine-grained audio-visual analysis,particularly in aligning conti-nuous audio features with spatial pixel-level information.Therefore,this paper proposed an audio-visual segmentation attention fusion(AVSAF)method based on contrastive learning.Firstly,the method used multi-head cross-attention mechanism and memory token to construct a audio-visual token fusion module to reduce the loss of multi-modal information.Secondly,it intro-duced contrastive learning to minimize the discrepancy between audio and visual features,enhancing their alignment.A dual-layer decoder was then employed to accurately predict and segment the target's position.Finally,it carried out a large number of experiments on the S4 and MS3 sub-datasets of the AVSBenge-Object dataset.The J-value is increased by 3.04 and 4.71 percentage points respectively,and the F value is increased by 2.4 and 3.5 percentage points respectively,which fully proves the effectiveness of the proposed method in audio-visual segmentation tasks.

关键词

视听分割/多模态/对比学习/注意力机制

Key words

audio-visual segmentation/multi-modal/contrastive learning/attention mechanism

分类

信息技术与安全科学

引用本文复制引用

李凡凡,张垣垣,章永龙,朱俊武..多维度交叉注意力融合的视听分割网络[J].计算机应用研究,2025,42(6):1656-1661,6.

基金项目

国家自然科学基金资助项目(61872313) (61872313)

计算机应用研究

OA北大核心

1001-3695

访问量0
|
下载量0
段落导航相关论文