计算机工程与应用2024,Vol.60Issue(20):293-301,9.DOI:10.3778/j.issn.1002-8331.2306-0206
基于注视转移学习的视频注视目标检测
Learning Gaze Transition for Gaze Target Detection in Video
摘要
Abstract
Gaze target detection in the video aims to localize the gaze target in each video frame.The person gazes at dif-ferent targets at different times.In the transition segment from one gaze target to gaze at another,the person may not gaze at a specific target.The gaze target detection method with an image transformer neglects to consider the temporal transi-tion segment.The gaze direction in the transition segment may hinder the gaze target detection in the video.For gaze tar-get detection in video,this paper proposes a gaze transition-based model,which contains a gaze direction guidance mod-ule,and a gaze transition temporal fusion module.In the gaze direction guidance module,the position of the gaze target is used to learn the heatmap of the gaze direction.The gaze target is detected by guiding with the heatmap of the gaze direc-tion,which can suppress the target out of the gaze direction and predict the accurate position of the gaze target.In the gaze transition temporal fusion module,the heatmap in multiple frames forms the spatial-temporal heatmap.To learn the changes in the spatial-temporal heatmap,this paper uses bi-directional spatial-temporal convolution long short-term memory(LSTM),which can extract the memory-based spatial-temporal heatmap.The gaze transition is described by introducing the Gaussian-based temporal model.To localize the temporal segment of the gaze transition with uncertainty temporal length,this paper designs a Gaussian-based temporal fusion method,which can estimate the gaze transition with the start timestamp,the end timestamp,and the temporal length.By localizing the gaze transition segment,the transition effect can be removed for gaze target detection.Gaze transition-based model is trained with gaze direction-based loss,gaze target existence loss,gaze target heatmap loss,and gaze transition temporal localization loss.In the GazeFollow dataset and VideoAttentionTarget dataset,the experimental results show that the gaze transition-based model outperforms the image transformer-based model for gaze target detection in video.关键词
注视目标检测/注视转移/注视目标热图/时空卷积长短期记忆网络/高斯时间融合Key words
gaze target detection/gaze transition/gaze target heatmap/spatial-temporal convolution long short-term memory/Gaussian-based temporal fusion分类
信息技术与安全科学引用本文复制引用
杨兴明,史俊彪,李自强,吴克伟,谢昭..基于注视转移学习的视频注视目标检测[J].计算机工程与应用,2024,60(20):293-301,9.基金项目
国家重点研发计划(2017YFB1002203) (2017YFB1002203)
安徽省自然科学基金(JZ2021AKZR0351). (JZ2021AKZR0351)