首页|期刊导航|农业工程学报|基于注意力机制的双目立体匹配家畜3D姿态估计

基于注意力机制的双目立体匹配家畜3D姿态估计

谢元澄陈自强李添天严心悦姜海燕潘增祥

农业工程学报2025，Vol.41Issue(3)：163-170,8.

农业工程学报2025，Vol.41Issue(3)：163-170,8.DOI:10.11975/j.issn.1002-6819.202404137

基于注意力机制的双目立体匹配家畜3D姿态估计

Livestock 3D posture estimation method based on the attention mechanism for binocular stereo matching

谢元澄 ¹陈自强 ¹李添天 ¹严心悦 ¹姜海燕 ¹潘增祥²

作者信息

1. 南京农业大学人工智能学院,南京 210095
2. 南京农业大学动物科技学院,南京 210095
折叠

摘要

Abstract

Accurate and rapid estimation of spatial posture is crucial to monitoring the behavior in group-housed livestock.Among them,3D posture estimation can offer precise spatial data in the conditions of occlusion,compared with the traditional 2D.The current techniques of 3D posture estimation are primarily applied in human and autonomous driving fields,thus depending mainly on expensive measurement equipment and datasets.However,it is still challenging for animal behavior and management at present.Therefore,it is urgently needed for the low-cost and efficient measurement of animal behavior posture.In this study,a general approach was proposed to estimate the 3D posture of livestock using binocular stereo matching.Firstly,a modified model of stereo matching was employed to obtain the depth information using deep learning.Then,a top-down 2D posture model was used to extract the target bounding boxes and then detect the key points.Finally,the locations of key points were mapped back to the image space and then fused with the stereo-matching model for the 3D posture information.Since the matching accuracy depended on the precise depth information,the main challenges in stereo matching were attributed to the thin structure and weak texture matching.ACLNet stereo matching model was constructed using attention mechanism and ConvGRU iterative refinement.The relative depth layers of image textures were encoded to restrict the attention of the model to the areas near the true disparity.The high-precision depth information was gradually recovered in a residual manner.Ablation experiments and generalization tests were carried out on the Scene Flow dataset and the Middlebury dataset,respectively,in order to validate the effectiveness of the ACLNet model.The results show that the ACLNet model was achieved in an endpoint error(EPE)of 0.45 on the Scene Flow dataset,with a reduction of 0.37 pixels,compared with the baseline model without attention and ConvGRU mechanisms.Better generalization was also performed on real-world datasets,such as Middlebury.The EPE was 0.56 on the goat depth dataset;The mean per joint position error(MPJPE)on the goat 3D posture test set reached 45.7 mm in the improved model.There was a decrease of 21.1 mm,compared with the baseline.Strong generalization and versatility were obtained to accurately estimate the livestock 3D posture without additional training.The 3D posture estimation experiments were also verified to take the goats as test subjects.Binocular images were only required to accurately obtain the 3D posture.The feasibility of high-precision 3D posture estimation on livestock was then validated using a simple binocular system.The finding can provide a viable solution to accurate 3D posture estimation using low-cost stereo cameras.

关键词

家畜/注意力机制/立体匹配/3D姿态估计/卷积门控循环单元

Key words

livestock/attention mechanism/stereo matching/3D pose estimation/ConvGRU

分类

农业科技

引用本文复制引用

谢元澄,陈自强,李添天,严心悦,姜海燕,潘增祥..基于注意力机制的双目立体匹配家畜3D姿态估计[J].农业工程学报,2025,41(3):163-170,8.

基金项目

国家自然科学基金面上项目(31872847) （31872847）

科技创新 2030 重大项目(2023ZD0404701) （2023ZD0404701）

农业工程学报

OA北大核心

ISSN：1002-6819

访问量0

下载量0

段落导航