计算机工程与应用2025,Vol.61Issue(18):175-186,12.DOI:10.3778/j.issn.1002-8331.2406-0242
结合时空注意力的视触融合目标识别方法
Vision-Tactile Fusion Method for Object Recognition Combining Spatio-Temporal Attention
摘要
Abstract
A spatio-temporal attention-based vision-tactile fusion method for object recognition is proposed to address the inadequacy in handling spatio-temporal and cross-modal heterogeneous information in continuous visual and tactile frames.The method begins with using Swin Transformer modules to extract features from visual and tactile images,thereby reducing cross-modal heterogeneity.It then employs a spatio-temporal Transformer module based on an attention bottleneck mechanism to enable spatio-temporal and cross-modal interactions between visual and tactile features.Following this,a multi-head self-attention fusion module adaptively aggregates information from these features,enhancing object recognition performance.Finally,a fully connected layer produces the recognition results.The accuracy and F1 score of this model on The Touch and Go dataset are 98.38%and 96.83%,respectively,which are 0.90 and 0.63 percentage points higher than the best contrast model.Additionally,ablation experiments validate the effectiveness of each proposed module.This approach significantly improves the handling of spatio-temporal and cross-modal information,offering a robust solution for advanced object recognition in intelligent robotics.关键词
多模态融合/目标识别/视触融合/Transformer/自注意力/时空信息Key words
multimodal fusion/object recognition/vision-tactile fusion/Transformer/self-attention/spatio-temporal information分类
信息技术与安全科学引用本文复制引用
刘佳,栗文龙,陈大鹏,张松,黄孝荣..结合时空注意力的视触融合目标识别方法[J].计算机工程与应用,2025,61(18):175-186,12.基金项目
国家自然科学基金(62003169) (62003169)
江苏产业前瞻与关键技术重点项目(BE2020006-2) (BE2020006-2)
江苏省自然科学基金青年基金(BK20200823). (BK20200823)