河南理工大学学报(自然科学版)2025,Vol.44Issue(4):11-20,10.DOI:10.16186/j.cnki.1673-9787.2024070012
基于连续帧信息融合建模的小样本视频行为识别方法
Few-shot action recognition in video method based on continuous fame information fusion modeling
摘要
Abstract
Objectives To overcome the limitations of existing few-shot video action recognition methods in capturing global spatiotemporal information and modeling complex behaviors,a new network architecture was developed to significantly enhances the accuracy and robustness of few-shot learning in video action recognition tasks.Methods A network architecture was presented integrating a continuous frame information fusion module and a multi-dimensional attention modeling module.The continuous frame information fusion module was positioned at the input end of the network,primarily responsible for capturing and transforming low-level information into richer high-level semantic information.The multi-dimensional attention modeling module was set in the middle layer of the network.The entire network was designed based on a 2D convolu-tional model,effectively reducing computational complexity.Results Experiments on four mainstream action recognition datasetsshowed that,on the Something-Something V2 dataset,the accuracy rates for 1-shot and 5-shot tasks reached 50.8%and 68.5%,respectively;on the Kinetics-100 dataset,the 1-shot and 5-shot tasks achieved accuracy rates of 68.5%and 83.8%,respectively,showing significant improvement over ex-isting methods;on the UCF101 dataset,the method achieved an accuracy rate of 81.3%for the 1-shot task and 93.8%for the 5-shot task,both markedly superior to baseline methods.Additionally,on the HMDB51 dataset,the method demonstrated good generalization performance,with accuracy rates of 56.0%for the 1-shot task and 74.4%for the 5-shot task.Conclusions The continuous frame integration modeling network has shown significant advantages in improving the model's ability to process complex spatiotemporal infor-mationThe solutions presented in this study could introduce effective new methods to the field of few-shot action recognition,demonstrating their efficiency and practicality.关键词
小样本学习/视频行为识别/时空建模/时空表征学习/连续帧信息Key words
few-shot learning/video action recognition/spatiotemporal modeling/spatiotemporal representa-tion learning/continuous frame information分类
计算机与自动化引用本文复制引用
张冰冰,李海波,马源晨,张建新..基于连续帧信息融合建模的小样本视频行为识别方法[J].河南理工大学学报(自然科学版),2025,44(4):11-20,10.基金项目
国家自然科学基金资助项目(61972062) (61972062)
吉林省科技发展计划项目(20230201111GX) (20230201111GX)
辽宁省应用基础研究计划项目(2023JH2/101300191,2023JH2/101300193) (2023JH2/101300191,2023JH2/101300193)
先进设计与智能计算省部共建教育部重点实验室开放课题(ADIC2023ZD003) (ADIC2023ZD003)