光学精密工程2025,Vol.33Issue(16):2502-2515,14.DOI:10.37188/OPE.20253316.2502
基于多尺度注意力机制TransUNet的双目视觉定位与测量方法
Binocular vision localization and measurement method based on TransUNet with multi-scale attention mechanism
摘要
Abstract
Aiming at the problems of low detection efficiency of traditional binocular-vision feature-detec-tion algorithms,as well as the insufficient attention to globally important features and the excessive pa-rameter count of most network models,a method of continuous-casting-billet localization and measure-ment based on TransUNet binocular vision with a multiscale attention mechanism was proposed.First-ly,left and right images of continuous-casting billets were collected with a calibrated parallel binocular camera to build a dataset.Subsequently,taking TransUNet as the backbone,an improved Transformer layer was introduced to extract global context information;a Global Spatial Group Attention(GSGA)module was appended after every decoder block to enhance focus on globally salient features through a grouped multiscale attention mechanism;and a Convolutional Block Attention Module(CBAM)was in-serted after each encoder-decoder skip connection and bilinear interpolation to boost key-point recogni-tion by combining spatial and channel attention.Finally,3-D coordinate reconstruction and distance measurement were performed on the network's key-point coordinates by leveraging binocular-vision principles.The experimental results show that compared with the Transformer model,the root-mean-square error and normalized error are reduced by 33.8%and 36.83%,the number of parameters and floating-point operations are reduced by 10.58%and 8.21%,and the single-batch inference time is shortened by 32.30%.In 3D ranging,the relative error of measurement reaches 0.137%,which is sig-nificantly better than the traditional feature detection algorithm and meets the binocular vision localiza-tion and measurement requirements.关键词
双目视觉/TransUNet/关键点检测/注意力机制Key words
binocular vision/TransUNet/keypoints detection/attention mechanism分类
信息技术与安全科学引用本文复制引用
杨玉,许四祥,张梦权,吴端正..基于多尺度注意力机制TransUNet的双目视觉定位与测量方法[J].光学精密工程,2025,33(16):2502-2515,14.基金项目
国家自然科学基金资助项目(No.51374007) (No.51374007)
安徽高校自然科学研究重点项目(No.KJ2020A0259) (No.KJ2020A0259)