数据采集与处理2025,Vol.40Issue(5):1322-1332,11.DOI:10.16337/j.1004-9037.2025.05.017
MonoDI:基于融合深度实例的单目3D目标检测
MonoDI:Monocular 3D Object Detection Based on Fusing Depth Instances
摘要
Abstract
Monocular 3D object detection aims to locate the 3D bounding boxes of objects in a single 2D input image,which is an extremely challenging task in the absence of image depth information.To address the issues of poor detection performance due to the absence of depth information during inference on 2D images and background noise interference in depth maps,this paper proposes a monocular 3D object detection method called MonoDI,which integrates depth instances.The key idea is to utilize depth information generated by an effective depth estimation network and combine it with instance segmentation masks to obtain depth instances,and then integrate the depth instances with 2D image information to aid in regressing 3D object information.To better use the depth instance information,this paper designs an iterative depth aware attention fusion module(iDAAFM),integrating depth instance feature with 2D image feature to obtain a feature representation with clear object boundaries and depth information.Subsequently,a residual convolutional structure is introduced during training and inference to replace the general single convolutional structure to ensure stability and efficiency of the network when processing fused information.Further,we design a 3D bounding box uncertainty auxiliary task to assist the main task in learning the generation of bounding boxes in training and improving the accuracy of monocular 3D object detection.Finally,the effectiveness of the method is validated on the KITTI dataset and experimental results show that the proposed method improves 3D object detection accuracy for the vehicle class at the moderate difficulty level by 4.41 percentage points compared with the baseline,and outperforms comparative methods such as MonoCon and MonoLSS.And it also achieves superior results on the KITTI-nuScenes cross-dataset evaluation.关键词
单目3D目标检测/实例分割/特征融合/残差卷积/辅助学习Key words
monocular 3D object detection/instance segmentation/feature fusion/residual convolution/auxiliary learning分类
信息技术与安全科学引用本文复制引用
赵科,董浩然,业宁..MonoDI:基于融合深度实例的单目3D目标检测[J].数据采集与处理,2025,40(5):1322-1332,11.基金项目
国家重点研发计划资助项目(2016YFD600101). (2016YFD600101)