首页|期刊导航|信号处理|基于扩散模型的注意力驱动RGB-D显著性目标检测

基于扩散模型的注意力驱动RGB-D显著性目标检测

李恭杨史世翔李红云

信号处理2026，Vol.42Issue(2)：235-248,14.

信号处理2026，Vol.42Issue(2)：235-248,14.DOI:10.12466/xhcl.2026.02.010

基于扩散模型的注意力驱动RGB-D显著性目标检测

Attention-Driven RGB-D Salient Object Detection Based on Diffusion Model

李恭杨 ¹史世翔 ²李红云³

作者信息

1. 泉州职业技术大学环境认知与智能系统实验室,福建泉州 362000||上海大学通信与信息工程学院,上海 200444
2. 上海大学通信与信息工程学院,上海 200444
3. 泉州职业技术大学联合创新产业学院,福建泉州 362000
折叠

摘要

Abstract

Salient object detection is an important research direction in computer vision,aiming to extract the regions that the human eye pays the most attention to from complex backgrounds.Traditional RGB salient detection methods relied only on the image's color information and had difficulty dealing with the diversity and interference in complex scenes.Therefore,RGB-D salient object detection,based on traditional RGB images,additionally introduced depth information,thereby enabling the perception of the spatial structure of the image better and further improving the performance of salient object detection.However,most existing RGB-D salient object detection methods are based on convolutional neural networks or vision Transformers,mainly relying on discriminative learning for salient object detection,that is,achieving prediction by hard classification of pixel-level saliency probabilities.There is often the problem of model overconfidence,which limits the detection performance of existing methods in complex scenes.To address the above problems,this paper proposes an attention-driven RGB-D salient object detection method based on the diffusion model.By using the progressive noise addition and stepwise denoising processes of the diffusion model,the prediction results were effectively optimized in a generative manner,reducing the risk of incorrect estimation caused by the overconfidence of the model and improving the detection performance of the network in complex scenarios.Firstly,this paper adopted the Pyramid Vision Transformer to achieve four-level feature extraction for RGB images and depth maps.Then,the proposed dual-stream attention fusion module was used to fully fuse the features of the two modes corresponding to the feature level.Subsequently,the fusion features at four different levels were fused through the progressive fusion module to achieve feature fusion.Finally,they were injected into the denoising network as conditional information to impose conditional constraints on the output of the diffusion model and generate the predicted saliency map.The experimental results show that the proposed method outperforms existing mainstream methods in multiple metrics on seven public benchmark datasets,namely DUT,LFSD,NJU2K,NLPR,SIP,SSD,and STERE,which proves the effectiveness of the proposed method.

关键词

RGB-D图像/显著性目标检测/扩散模型/注意力机制

Key words

RGB-D image/salient object detection/diffusion model/attention mechanism

分类

信息技术与安全科学

引用本文复制引用

李恭杨,史世翔,李红云..基于扩散模型的注意力驱动RGB-D显著性目标检测[J].信号处理,2026,42(2):235-248,14.

基金项目

国家自然科学基金(62401350) （62401350）

上海市科委启明星项目扬帆专项(24YF2713000) （24YF2713000）

泉州职业技术大学2024年开放课题(LERIS24-02) （LERIS24-02）

泉州市科技计划项目(2025QZC02R) The National Natural Science Foundation of China(62401350) （2025QZC02R）

Shanghai Sailing Program(24YF2713000) （24YF2713000）

Opening Foundation of Quanzhou Vocational and Technical University in 2024(LERIS24-02) （LERIS24-02）

Quanzhou City Science&Technology Program of China(2025QZC02R) （2025QZC02R）

信号处理

ISSN：1003-0530

访问量0

下载量0

段落导航