首页|期刊导航|重庆工商大学学报（自然科学版）|基于多尺度特征混合注意力的连续帧深度估计

基于多尺度特征混合注意力的连续帧深度估计

郑宇航曹雏清

重庆工商大学学报（自然科学版）2024，Vol.41Issue(4)：104-111,8.

重庆工商大学学报（自然科学版）2024，Vol.41Issue(4)：104-111,8.DOI:10.16055/j.issn.1672-058X.2024.0004.013

基于多尺度特征混合注意力的连续帧深度估计

Continuous Frame Depth Estimation Based on Multi-scale Feature Mixed Attention Mechanism

郑宇航 ¹曹雏清²

作者信息

1. 安徽工程大学计算机与信息学院,安徽芜湖 241000
2. 安徽工程大学计算机与信息学院,安徽芜湖 241000||长三角哈特机器人产业技术研究院,安徽芜湖 241000
折叠

摘要

Abstract

Objective Estimating the depth information to obtain the distance between the photographed object and the camera is the method to obtain the depth information in monocular vision SLAM.As unsupervised monocular depth estimation algorithms suffer from insufficient accuracy as well as large errors,a continuous frame depth estimation network based on a hybrid attention mechanism with multi-scale feature fusion was proposed.Methods Information on depth and 6 degrees of freedom of pose were obtained by two encoder-decoder structures for depth estimation and pose estimation,respectively.The depth information and the pose information were used for image reconstruction with the original image loss calculation to output the depth information.The decoder encoder structure for depth estimation formed a U-shaped network,and the same encoder was used for both the pose estimation network and the depth estimation network,and the pose information was output through the pose estimation decoder.The feature maps at four different scales were extracted in the encoder using a hybrid attention mechanism CBAM network combined with a ResNet network.For the enhancement of the estimated depth information contour details,the extracted features of each different scale were then assigned learnable weight coefficients to extract local and global features and then fused with the original features.Results Evaluation of error and accuracy was performed on the KITTI dataset,and finally,testing was also performed.Compared with the classical monodepth2 monocular method,the relative error,root mean square error,and log root mean square error in the error evaluation metrics were reduced by 0.034,0.129,and 0.002,respectively,and self-made test images demonstrated the generalizability of the network.Conclusion The multiscale features are extracted using a ResNet network combined with a hybrid attention mechanism,while multiscale feature fusion on the extracted features enhances the depth estimation and improves the contour details.

关键词

单目视觉/连续帧深度估计/混合注意力机制/多尺度特征融合

Key words

monocular vision/continuous frame depth estimation/hybrid attention mechanism/multiscale feature fusion

分类

信息技术与安全科学

引用本文复制引用

郑宇航,曹雏清..基于多尺度特征混合注意力的连续帧深度估计[J].重庆工商大学学报（自然科学版）,2024,41(4):104-111,8.

基金项目

国家自然科学基金面上项目(62073101) （62073101）

高校优秀青年人才支持计划项目(019YQQ023) （019YQQ023）

安徽省教育厅科学研究重点项目(KJ2020A0364) （KJ2020A0364）

国家重点研发计划"智能机器人"重点专项(2018YFB1308900). （2018YFB1308900）

重庆工商大学学报（自然科学版）

ISSN：1672-058X

访问量0

下载量0

段落导航