基于双向融合纹理和深度信息的目标位姿检测OA北大核心CSTPCD
Target Position Detection Based on Bidirectional Fusion of Texture and Depth Information
针对在硬件设备资源有限的情况下,深度相机在非结构化场景如何获取物体精确的位姿信息问题,提出一种基于双向融合纹理和深度信息的目标位姿检测方法.在学习阶段,两个网络采用全流双向融合(FFB6D)模块,纹理信息提取部分引入轻量的 Ghost 模块,减少了网络的计算量,并加入能增强有用特征的注意力机制CBAM,深度信息提取部分扩展了局部特征并多层次特征融合,获取更全面的特征;在输出阶段,为提高效率利用实例语义分割结果过滤背景点,再进行3D关键点检测,最终通过最小二乘拟合算法得到位姿信息.在LINEMOD、Occlusion LINEMOD和YCB-Video公共数据集上验证,其精度分别达到了99.8%、66.3%和94%,且参数量减少了31%,表明改进的位姿估计方法在保证精度的同时,也减少了参数量.
Aiming at the problem of how to obtain accurate positional information of objects in unstructured scenes by depth cameras with limited hardware device resources,a target position detection method based on bidirectional fusion of texture and depth information is proposed.In the learning phase,two networks adopt the full-flow bidirectional fusion(FFB6D)module,the texture information extraction part introduces the lightweight Ghost module to reduce the computation of the network,and adds the attention mechanism CBAM that can enhance useful features,and the depth information extraction part extends the local features and multilevel feature fusion to obtain more comprehensive features.In the output stage,in order to improve the efficiency,the instance semantic segmentation results are utilized to filter background points,then 3D keypoint detection is performed,and finally the position information is obtained by the least square fitting algorithm.Validations are carried out on LINEMOD,Occlusion LINEMOD and YCB-Video public datasets,whose accuracies reach 99.8%,66.3%and 94%,respectively,and the amount of parameters is reduced by 31%,showing that the improved position estimation method can canreduce the number of parameters while guaranteeing the accuracy.
张亚炜;付东翔
上海理工大学光电信息与计算机工程学院,上海 200093
计算机与自动化
双向融合Ghost注意力机制深度学习位姿估计
bidirectional fusionGhostattention mechanismdeep learningposition estimation
《数据采集与处理》 2024 (005)
1214-1227 / 14
国家自然科学基金(61703277).
评论