| 注册
首页|期刊导航|华南理工大学学报(自然科学版)|深度几何特征引导多模态特征融合的3D手部姿态估计

深度几何特征引导多模态特征融合的3D手部姿态估计

关欣 刘晨曦 李锵

华南理工大学学报(自然科学版)2025,Vol.53Issue(11):37-51,15.
华南理工大学学报(自然科学版)2025,Vol.53Issue(11):37-51,15.DOI:10.12141/j.issn.1000-565X.250072

深度几何特征引导多模态特征融合的3D手部姿态估计

3D Hand Pose Estimation with Multimodal Feature Fusion Guided by Depth Geometric Features

关欣 1刘晨曦 1李锵1

作者信息

  • 1. 天津大学 微电子学院,天津 300072
  • 折叠

摘要

Abstract

Owing to the inherent instability in data acquisition quality,the reliance on either RGB or depth images alone in 3D hand pose estimation tasks frequently results in the loss of critical features.In contrast,multimodal ap-proaches that integrate the complementary semantic and structural strengths of both modalities exhibit significantly enhanced robustness.However,existing multimodal 3D hand pose estimation methods face significant challenges in effectively fusing RGB and depth information,primarily due to issues of feature redundancy,modality misalign-ment,and the loss of local features.These limitations significantly degrade the accuracy and stability of keypoint lo-calization.To address these challenges,this paper proposes a depth feature-guided multimodal keypoint feature en-hancement and fusion method.In this method,first,depth structural features are leveraged to capture hand contour and geometric information,thus providing an initial estimation of keypoint positions.Subsequently,RGB modal in-formation is employed to locally enhance depth features,thus effectively addressing the inherent limitations of depth modal in capturing structural features being lost due to voids and occlusions.Furthermore,a framework integrating the localized depth-based 3D structural features of keypoint is proposed to refine the initial RGB features,thus en-hancing the spatial structure understanding of the hand in the RGB modal.To optimize the fusion process,a global cross-modal attention mechanism is introduced to facilitate interactive learning,thus ensuring the global alignment of locally enhanced depth and RGB features while dynamically enhancing the complementarity between modalities.Compared with existing mainstream deep learning methods,the proposed approach helps to achieve the lowest er-rors of 7.52,1.80 and 7.40 mm on DexYCB,HO-3D and InterHand2.6M datasets,respectively.

关键词

多模态特征融合/手部姿态估计/几何特征引导/深度图像/RGB图像

Key words

multimodal feature fusion/hand pose estimation/geometric feature guidance/depth image/RGB image

分类

信息技术与安全科学

引用本文复制引用

关欣,刘晨曦,李锵..深度几何特征引导多模态特征融合的3D手部姿态估计[J].华南理工大学学报(自然科学版),2025,53(11):37-51,15.

基金项目

天津市自然科学基金项目(23JCZDJC00020) Supported by the Natural Science Foundation of Tianjin,China(23JCZDJC00020) (23JCZDJC00020)

华南理工大学学报(自然科学版)

OA北大核心

1000-565X

访问量0
|
下载量0
段落导航相关论文