计算机工程与科学2026,Vol.48Issue(3):521-530,10.DOI:10.3969/j.issn.1007-130X.2026.03.014
一种融合语义图卷积与自注意力机制的三维人体姿态估计方法
A 3D human pose estimation method integrating semantic graph convolutional network and self-attention mechanism
摘要
Abstract
Aiming at the problem that it is difficult to capture the global characteristics of human joint sequences and the estimation accuracy is not high,a 3D human pose estimation method combining semantic graph convolutional network and self-attention mechanism is proposed.Firstly,in order to im-prove the feature extraction effect in the process of mapping from two-dimensional human pose sequence to three-dimensional human pose sequence,self-attention mechanism is integrated into semantic graph convolutional network to carry out spatial feature extraction based on the integration of local features and global features.Secondly,the channel-mixing module of the MLP-Mixer network is improved by in-troducing a semantic graph convolutional network and a U-shaped MLP structure for temporal feature extraction.Finally,3D human pose estimation is performed based on the fused features from 2D human images and the extracted temporal features.Experimental evaluations on the Human3.6M dataset for 3D human pose estimation demonstrate that,compared with current mainstream 3D human pose estima-tion methods,the proposed method reduces the average error metrics MPJPE and PA-MPJPE by ap-proximately 4.5 mm and 0.2 mm compared with the suboptimal method,respectively.The experimen-tal results validate the effectiveness of the proposed method.关键词
三维人体姿态估计/语义图卷积/MLP-Mixer模型/自注意力机制Key words
3D human pose estimation/semantic graph convolutional network/MLP-Mixer model/self-attention mechanism分类
信息技术与安全科学引用本文复制引用
童立靖,英溢卓,曹楠..一种融合语义图卷积与自注意力机制的三维人体姿态估计方法[J].计算机工程与科学,2026,48(3):521-530,10.基金项目
北京市属高校青年拔尖人才培养计划(CIT&TCD201904009) (CIT&TCD201904009)