基于语义辅助和深度时序一致性约束的自监督单目深度估计OA北大核心CSTPCD
Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints
通过使用相邻帧之间的光度一致性损失代替深度标签作为网络训练的监督信号,基于图像序列训练的自监督单目深度估计方法近年来受到了广泛的关注.光度一致性约束遵循了静态世界假设,而单目图像序列中存在的运动目标违反该假设,进而影响自监督训练过程中相机位姿估计精度和光度损失函数的计算精度.通过检测并移除运动目标区域,可在得到与目标运动解耦的相机位姿的同时,消除运动目标区域对光度损失计算精度的影响.为此,本文提出了一种基于语义辅助和深度时序一致性约束的自监督单目深度估计网络.首先,使用离线的实例分割网络检测可能违反静态世界假设的动态类别目标,并移除对应区域输入位姿网络从而得到与物体运动解耦的相机位姿.其次,基于语义一致性和光度一致性约束,检测动态类别目标的运动状态,使得运动区域的光度损失不影响网络参数的迭代更新.最后,在非运动区域施加深度时序一致性约束,显式对齐当前帧的估计深度值与相邻帧的投影深度值,进一步细化深度预测结果.在KITTI、DDAD以及 KITTI Odometry 数据集上的实验验证了所提方法与以往的自监督单目深度估计方法相比具有更出色的性能表现.
Self-supervised monocular depth estimation methods trained on sequences of monocular images have received considerable attention in recent years by using the photometric consistency loss between adjacent frames instead of depth labels as the supervisory signal for network training.The photometric consistency constraint follows the static world assumption,but the moving objects in the monocular image sequence violate this assumption,which affects the camera pose estimation accuracy and the calculation accuracy of the photometric loss function during the self-supervised training process.By detecting and removing the moving target area,the camera pose decoupled from the target motion can be obtained,and the in fluence of the moving target area on the calculation accuracy of the photometric loss can be discarded.To this end,this paper proposes a self-supervised monocular depth estimation network based on semantic assistance and depth temporal consistency constraints.First,an offline instance segmentation network is used to detect dynamic category objects that may violate the static world assumption,and the corresponding region input pose network is removed to obtain a camera pose decoupled from object motion.Secondly,based on semantic consistency and photometric consistency constraints,the motion status of dynamic category targets is detected so that the photometric loss in the moving area does not affect the iterative update of network parameters.Finally,depth temporal consistency constraints are imposed in non-motion areas,and the estimated depth value of the current frame is explicitly aligned with the projected depth value of adjacent frames to further refine the depth prediction results.Experiments on the KITTI,DDAD and KITTI Odometry datasets verify that the proposed method has better performance than previous self-supervised monocular depth estimation methods.
凌传武;陈华;徐大勇;张小刚
湖南大学 电气与信息工程学院,湖南 长沙 410082湖南大学 信息科学与工程学院,湖南 长沙 410082中国烟草总公司郑州烟草研究院,河南 郑州 450000
计算机与自动化
单目深度估计自监督学习运动目标时序一致性
monocular depth estimationself supervision learningmoving objecttemporal consistency
《湖南大学学报(自然科学版)》 2024 (008)
1-12 / 12
国家自然科学基金资助项目(62171184,62273139,62106072),National Natural Science Foundation of China(62171184,62273139,62106072);国家自然科学基金区域联合重点项目(U23A20385),Joint Funds of the National Natural Science Foundation of China(U23A20385);国防预研项目(JCY2021206B015),National Defense Pre-research Foundation(JCY2021206B015)
评论