| 注册
首页|期刊导航|高技术通讯|时空多尺度关联特征融合的二维卷积网络细粒度动作识别模型

时空多尺度关联特征融合的二维卷积网络细粒度动作识别模型

胡正平 王昕宇 董佳伟 赵艳霜 刘洋

高技术通讯2024,Vol.34Issue(6):590-601,12.
高技术通讯2024,Vol.34Issue(6):590-601,12.DOI:10.3772/j.issn.1002-0470.2024.06.004

时空多尺度关联特征融合的二维卷积网络细粒度动作识别模型

Fine-grained 2D convolutional network model for action recognition based on spatio-temporal multi-scale correlation feature fusion

胡正平 1王昕宇 2董佳伟 2赵艳霜 2刘洋2

作者信息

  • 1. 燕山大学信息科学与工程学院 秦皇岛 066004||燕山大学河北省信息传输与信号处理重点实验室 秦皇岛 066004
  • 2. 燕山大学信息科学与工程学院 秦皇岛 066004
  • 折叠

摘要

Abstract

In order to solve the problems of traditional 2-dimensional(2D)convolutional network extracting spatiotempo-ral features at a single scale and insufficient utilization of long-range temporal correlation information between frames in fine-grained action data sets,this paper proposes a fine-grained 2D convolutional network model for action recog-nition based on spatio-temporal multi-scale correlation feature fusion model.First,in order to model the multi-scale spatial correlation of videos to enhance the spatial representation ability of fine-grained video data,the model uses a multi-scale'feature squeeze and feature excitation'method to make the spatial features extracted by the network more abundant and effective.Then,in order to fully utilize the motion information in the time dimension of fine-grained video data,a temporal window self attention mechanism is introduced,and the powerful long-range depend-ency modeling ability of Transformer is utilized to only perform self attention operations in the time dimension,mod-eling long-range time dependencies at a lower computational cost.Finally,considering that the extracted spatio-temporal features contribute unevenly to different types of action classification,an adaptive feature fusion module is introduced to dynamically assign different weights to features to achieve adaptive feature fusion.The model's Top-1 accuracy on the two fine-grained action recognition data sets Diving48 and Something-somethingV1 reached 86.0%and 46.9%respectively,which improved the Top-1 accuracy of the original backbone network by 3.8%and 1.3%respectively.Experimental results show that when only using video frame information as input,this mod-el achieves recognition accuracy comparable to existing algorithms based on Transformer and 3-dimensional convolu-tional neural network(3D CNN).

关键词

细粒度动作识别/多尺度时空关联特征/远程依赖建模/自注意力机制

Key words

fine-grained action recognition/multi-scale spatio-temporal correlation feature/long-range de-pendency modeling/self-attention mechanism

引用本文复制引用

胡正平,王昕宇,董佳伟,赵艳霜,刘洋..时空多尺度关联特征融合的二维卷积网络细粒度动作识别模型[J].高技术通讯,2024,34(6):590-601,12.

基金项目

国家自然科学基金面上项目(61771420)和国家自然科学基金(62001413)资助项目. (61771420)

高技术通讯

OA北大核心CSTPCD

1002-0470

访问量0
|
下载量0
段落导航相关论文