计算机与现代化Issue(9):121-126,6.DOI:10.3969/j.issn.1006-2475.2024.09.020
基于转置注意力的多尺度深度融合单目深度估计
Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention
摘要
Abstract
Monocular depth estimation is a fundamental task in computer vision,aiming to predict depth maps from single im-ages and retrieve depth information for corresponding pixel positions.This paper proposes a novel network architecture for mon-ocular depth estimation to further enhance the predictive accuracy of the network.Transposed attention introduces a self-attention mechanism,enabling it to focus on specific regions within the image while reducing the parameter and computation re-quirements.By incorporating information across different channels,it effectively captures fine-grained regions and edge details for learning.The paper presents an improved version of transposed attention that retains semantic information with fewer param-eters.Multi-scale depth fusion leverages the characteristic of extracting features with different depths from distinct channels.It computes the average depth for each channel,enhancing the model's depth perception capability.Furthermore,it models long-range dependencies for vertical distances,effectively separating edges between objects and mitigating the loss of fine-grained in-formation.Finally,the proposed modules'effectiveness is validated through experiments conducted on the NYU Depth V2 data-set and the KITTI dataset,demonstrating exceptional performance.关键词
深度学习/单目深度估计/转置注意力/多尺度深度融合/通道平均深度Key words
deep learning/monocular depth estimation/transposed attention/multi-scale deep fusion/channel average depth分类
计算机与自动化引用本文复制引用
程亚子,雷亮,陈瀚,赵毅然..基于转置注意力的多尺度深度融合单目深度估计[J].计算机与现代化,2024,(9):121-126,6.基金项目
国家自然科学基金资助项目(62006046) (62006046)