| 注册
首页|期刊导航|电子学报|基于时空自适应融合的双模行为识别

基于时空自适应融合的双模行为识别

卿宇寒 高陈强 谭卓林 刘芳岑

电子学报2025,Vol.53Issue(7):2389-2400,12.
电子学报2025,Vol.53Issue(7):2389-2400,12.DOI:10.12263/DZXB.20250026

基于时空自适应融合的双模行为识别

Bimodal Action Recognition Based on Spatiotemporal Adaptive Fusion

卿宇寒 1高陈强 2谭卓林 1刘芳岑1

作者信息

  • 1. 重庆邮电大学通信与信息工程学院,重庆 400065
  • 2. 中山大学·深圳智能工程学院,广东 深圳 518107
  • 折叠

摘要

Abstract

Bimodal action recognition aims to enhance recognition performance in complex scenarios by leveraging complementary information across different data modalities to overcome the limitations of single-modal approaches.Exist-ing methods typically adopt independent backbone networks to extract features from each modality separately before per-forming feature fusion.However,they often fail to adequately address semantic discrepancies between modalities,such as cross-modal feature misalignment and representational inconsistency,which can introduce noise during the fusion process and degrade recognition accuracy.To address these issues,this paper proposes a spatiotemporal adaptive fusion framework for bimodal action recognition.Specifically,a temporal keyframe selection module is introduced to identify and emphasize informative frames through a competitive mechanism.Simultaneously,a spatial salient region selection module adaptively filters discriminative regions across modalities,suppressing irrelevant information and guiding the network to learn more ro-bust spatiotemporal representations.In addition,a self-distillation mechanism is employed to reinforce the network's focus on action-relevant features,incorporating both prediction distribution loss and region-level distillation loss to facilitate fine-grained feature optimization.To further improve the fusion quality,an adaptive mask fusion module is proposed,which at-tenuates the influence of uninformative regions by applying learnable masks within the multi-head self-attention and multi-layer perceptron computations.Experimental results on the InfRA and NTU RGB+D datasets demonstrate that the proposed method achieves Top-1 accuracy improvements of 3.75%and 3.49%,respectively,compared to baseline models,validating the effectiveness of the proposed framework in adaptively selecting and integrating bimodal features for improved action recognition.

关键词

双模行为识别/关键帧/显著区域/自蒸馏/自适应融合

Key words

bimodal action recognition/key frame/salient region/self-distillation/adaptive fusion

分类

信息技术与安全科学

引用本文复制引用

卿宇寒,高陈强,谭卓林,刘芳岑..基于时空自适应融合的双模行为识别[J].电子学报,2025,53(7):2389-2400,12.

基金项目

国家自然科学基金(No.62176035) (No.62176035)

深圳市基础研究项目(No.JCYJ20240813151216022) National Natural Science Foundation of China(No.62176035) (No.JCYJ20240813151216022)

Shenzhen Fundamental Re-search Program(No.JCYJ20240813151216022) (No.JCYJ20240813151216022)

电子学报

OA北大核心

0372-2112

访问量0
|
下载量0
段落导航相关论文