河南理工大学学报(自然科学版)2025,Vol.44Issue(4):21-28,8.DOI:10.16186/j.cnki.1673-9787.2025020018
参数高效化微调的双分支视频动作识别方法
Two-branch video action recognition method based on high-efficiency parameter fine-tuning
摘要
Abstract
Objectives Video-oriented AI intelligent sports has important practical value for personalized training and customized sports analysis.Existing video motion analysis frameworks rely on the"pre-training then fine-tuning"paradigm to transfer image pre-training models to video timing modeling.However,with the continuous expansion of model size and pre-training scale,on the one hand,full-parameter updating through direct fine-tuning was demonstrated to cause high computational costs,On the other hand,effec-tive modeling of spatiotemporal video features was shown to be unachievable when relying solely on large-scale image-based architectures.Methods Therefor,a two-branch video action recognition framework named TBN(two branch network)was proposed,which was constructed based on large-scale image pre-trained models.The architecture incorporated a spatiotemporally decoupled two-branch structure,where static background features and temporal dynamic motion features were separately processed through distinct computational pathways.During the migration process,the pre-trained weights remained frozen,while parameter-efficient transferring from the image pre-trained model to video temporal modeling was achieved through exclusive training of the minimally augmented parameters in both components of Prompt and Adap-tor.Additionally,to address the limitations of existing benchmark datasets in high-speed motion scenarios,a large-scale sports dataset named Kinetics-Sports was constructed.The dataset comprised 42 sports catego-ries(including basketball,ice skating,hurdling,etc.),establishing a more rigorous testing benchmark for motion analysis.Results The experimental results on the Kinetics-Sports,UCF101,and HDBM51 datasets demonstrated that the proposed method achieved recognition accuracies of 97.8%,78.0%,and 74.2%re-spectively across these three benchmarks,outperforming state-of-the-art approaches on the corresponding datasets.Furthermore,the framework was implemented with merely 12 M parameters and exhibited lower computational complexity compared to prevailing mainstream algorithms.Conclusions The proposed model achieved a more favorable balance between accuracy and efficiency,whereby the accuracy of sports action detection was enhanced and computational efficiency during inference was improved.This approach thereby provided an efficient solution for video transfer learning in prevailing large-scale vision models.关键词
视频动作识别/预训练模型/参数高效化微调/双分支网络/时空建模Key words
video action recognition/pre-training model/high efficient parameter fine-tuning/two-branch network/space-time modeling分类
计算机与自动化引用本文复制引用
王小伟,沈燕飞,邢庆君..参数高效化微调的双分支视频动作识别方法[J].河南理工大学学报(自然科学版),2025,44(4):21-28,8.基金项目
国家自然科学基金资助项目(72071018) (72071018)
河南省科技攻关计划项目(212102310264) (212102310264)