计算机工程与应用2024,Vol.60Issue(11):84-94,11.DOI:10.3778/j.issn.1002-8331.2302-0035
利用Transformer的多模态目标跟踪算法
Trans-RGBT:RGBT Object Tracking with Transformer
摘要
Abstract
The current object tracking methods mostly fuse different modal information to make localization decisions,which has the problems of insufficient information extraction,simple fusion methods,and inability to accurately track targets in low-light scenes.To this end,a Transformer-based multi-modal object tracking algorithm(Trans-RGBT)is proposed.Firstly,the visible and infrared images are extracted separately by using a pseudo-twin network,and fully fused at the fea-ture level.Secondly,the first frame of target information is modulated into feature vector of the frame to be tracked to obtain a dedicated tracker.Then,transformer method is applied to code and decode for target in the field of view.Spatial position of the target in the field of view is predicted by the spatial position prediction branch and the interference target is filtered out by combining the historical information to obtain accurate position of the target.Finally,external rectangular frame of the target is predicted by using the rectangular frame regression network,so as to achieve accurate target track-ing.Full experiments are conducted on the latest large-scale dataset VTUAV and RGBT234.In comparison with the twin network(Siam-based)and filtering(filter-based)algorithms,Trans-RGBT has higher accuracy,better robustness and achieves a real-time tracking speed of 22 frames per second.关键词
多模态融合/可见光图像/红外图像/Transformer/目标跟踪Key words
multi-modal fusion/visible images/infrared images/Transformer/object tracking分类
信息技术与安全科学引用本文复制引用
刘万军,梁林林,曲海成..利用Transformer的多模态目标跟踪算法[J].计算机工程与应用,2024,60(11):84-94,11.基金项目
辽宁工程技术大学学科创新团队(LNTU20TD-23) (LNTU20TD-23)
辽宁省高等学校基本科研项目(LJKMZ20220699). (LJKMZ20220699)