|国家科技期刊平台
首页|期刊导航|中南大学学报(自然科学版)|基于双目视觉与Transformer的连铸坯模型定位与测量

基于双目视觉与Transformer的连铸坯模型定位与测量OA北大核心CSTPCD

Continuous casting slab model positioning and measurement based on binocular vision and Transformer

中文摘要英文摘要

针对双目视觉传统检测算法效率低、匹配复杂等问题,本文提出一种基于双目视觉与Transformer的连铸坯模型定位与测量方法.首先,使用标定后的平行双目相机采集连铸坯模型左右图像,经校正、标注后将其作为数据集.然后以改进的Transunet*为骨干,利用神经网络对数据集输出关键点坐标,网络模型采用多尺度U型结构来抵消因下采样量化而产生的高斯热图理论误差下界.为改善卷积神经网络只关注局部特征的缺陷,加入Transformer结构来强化每个通道内的信息交互,并提出一种优化的损失函数计算方式来克服正负样本比例失调问题以及加速网络收敛.最后,对网络输出的关键点坐标进行双目视觉三维重建并完成测距.研究结果表明:本文算法在关键点检测精度上比其他神经网络方法的高,相较于次优方法,本文方法均方根误差和归一化平均误差分别减少17.24%和18.58%;在三维测距上,其精度明显高于传统特征检测算法精度,满足工业上测量定位的精度高、受环境影响小等要求.

In order to address the problems of low efficiency and complex matching of traditional binocular vision detection algorithms,a continuous casting slab model positioning and measurement based on binocular vision and Transformer method was proposed in this paper.Firstly,a calibrated parallel binocular camera was used to collect images of the continuous casting slab model,which were used as datasets after correction and labeling.Then,with the proposed Transunet* as the backbone,a neural network was used to output the key point coordinates of the datasets.The network model adopted a multi-scale U-shape structure to offset the lower bound of theoretical error of Gaussian heatmap caused by the downsampling quantization.In order to improve the defect that convolutional neural networks only focus on local features,Transformer module was added to enhance the information exchange in each channel,and an optimized loss function calculation method was proposed to overcome the problem of the misproportion of positive and negative samples and accelerate network convergence.Finally,the network output was reconstructed with binocular vision to complete the distance measurement.The results show that the proposed algorithm outperforms other neural network methods in the detection accuracy of key points.Compared with the sub-optimal methods,the root-mean-square error and normalized mean error the proposed method are reduced by 17.24%and 18.58%,respectively.In the three-dimensional ranging,the accuracy of the proposed method is obviously superior to that of the traditional feature detection algorithm.Thus,the proposed method can meet the requirements of high precision and small environmental impact in industrial measurement and positioning.

李同谱;许四祥;施宇翔;杨利法

安徽工业大学机械工程学院,安徽马鞍山,243032

计算机与自动化

双目视觉Transformer关键点检测注意力机制

binocular visionTransformerlandmark detectionattention mechanism

《中南大学学报(自然科学版)》 2024 (004)

1312-1322 / 11

国家自然科学基金资助项目(51374007);安徽高校自然科学研究重点项目(KJ2020A0259);Project(51374007)supported by the National Natural Science Foundation of China;Project(KJ2020A0259)supported by the Key Project of Natural Science Research of Anhui Educational Committee)

10.11817/j.issn.1672-7207.2024.04.006

评论