首页|期刊导航|计算机与现代化|基于混合Transformer的视线估计模型

基于混合Transformer的视线估计模型OA

Gaze Estimation Model Based on Hybrid Transformer

中文摘要英文摘要

采用CNN与Transformer相结合的方法,利用Transformer能获取全局特征信息的优势,提高模型上下文信息感知能力,从而改善模型精度.本文提出一种新颖的基于混合Transformer的视线估计模型ResNet-MHSA(RN-SA),该模型将ResNet18中部分3×3空间卷积层替换为由一个1×1的空间卷积层和MHSA(Multi-Head Self-Attention)层组合而成的块,并在模型结构中添加DropBlock机制,以增加模型的鲁棒性.实验结果表明,RN-SA模型在减少参数量的同时改善了模型的精度,与目前较好的模型GazeTR-Hybrid相比,在参数数量减少15.8%的情况下,在EyeDiap和Gaze360数据集上精度分别提高了4.1%和3.7%.因此,CNN与Transformer相结合的方式能有效应用于视线估计任务中.

Combined CNN and Transformer,Transformer can gain the advantage of global feature information and improve the awareness of model context information,which can lead to improve model accuracy.A novel gaze estimation model RN-SA(ResNet-MHSA)based on a hybrid Transformer is proposed.In this model,part of the 3×3 spatial convolution layers in ResNet18 are replaced with a block composed of a 1×1 spatial convolution layer and MHSA(Multi-Head Self-Attention)layer,and the DropBlock mechanism is added to the model structure to increase the robustness of the model.Experimental results show that RN-SA model can improve the accuracy of the model while reducing the number of parameters compared with the current better model GazeTR-Hybrid,RN-SA model can improve the accuracy by 4.1%and 3.7%on EyeDiap and Gaze360 datasets,respectively,while the number of parameters is reduced by 15.8%.Therefore,the combination of CNN and Transformer can be effectively applied to gaze estimation tasks.

程章;刘丹;王艳霞

重庆师范大学计算机与信息科学学院,重庆 401331重庆师范大学计算机与信息科学学院,重庆 401331重庆师范大学计算机与信息科学学院,重庆 401331

计算机与自动化

视线估计自注意力MHSATransformer

gaze estimationself-attentionMHSATransformer

《计算机与现代化》 2025 (4)

1-5,11,6

重庆市科委科学研究项目(cstc2021jcyj-msxm2791)重庆市教委科技项目(KJZD-K202200513)

10.3969/j.issn.1006-2475.2025.04.001

评论