计算机与现代化Issue(4):1-5,11,6.DOI:10.3969/j.issn.1006-2475.2025.04.001
基于混合Transformer的视线估计模型
Gaze Estimation Model Based on Hybrid Transformer
摘要
Abstract
Combined CNN and Transformer,Transformer can gain the advantage of global feature information and improve the awareness of model context information,which can lead to improve model accuracy.A novel gaze estimation model RN-SA(ResNet-MHSA)based on a hybrid Transformer is proposed.In this model,part of the 3×3 spatial convolution layers in ResNet18 are replaced with a block composed of a 1×1 spatial convolution layer and MHSA(Multi-Head Self-Attention)layer,and the DropBlock mechanism is added to the model structure to increase the robustness of the model.Experimental results show that RN-SA model can improve the accuracy of the model while reducing the number of parameters compared with the current better model GazeTR-Hybrid,RN-SA model can improve the accuracy by 4.1%and 3.7%on EyeDiap and Gaze360 datasets,respectively,while the number of parameters is reduced by 15.8%.Therefore,the combination of CNN and Transformer can be effectively applied to gaze estimation tasks.关键词
视线估计/自注意力/MHSA/TransformerKey words
gaze estimation/self-attention/MHSA/Transformer分类
信息技术与安全科学引用本文复制引用
程章,刘丹,王艳霞..基于混合Transformer的视线估计模型[J].计算机与现代化,2025,(4):1-5,11,6.基金项目
重庆市科委科学研究项目(cstc2021jcyj-msxm2791) (cstc2021jcyj-msxm2791)
重庆市教委科技项目(KJZD-K202200513) (KJZD-K202200513)