电子学报2025,Vol.53Issue(7):2389-2400,12.DOI:10.12263/DZXB.20250026
基于时空自适应融合的双模行为识别
Bimodal Action Recognition Based on Spatiotemporal Adaptive Fusion
摘要
Abstract
Bimodal action recognition aims to enhance recognition performance in complex scenarios by leveraging complementary information across different data modalities to overcome the limitations of single-modal approaches.Exist-ing methods typically adopt independent backbone networks to extract features from each modality separately before per-forming feature fusion.However,they often fail to adequately address semantic discrepancies between modalities,such as cross-modal feature misalignment and representational inconsistency,which can introduce noise during the fusion process and degrade recognition accuracy.To address these issues,this paper proposes a spatiotemporal adaptive fusion framework for bimodal action recognition.Specifically,a temporal keyframe selection module is introduced to identify and emphasize informative frames through a competitive mechanism.Simultaneously,a spatial salient region selection module adaptively filters discriminative regions across modalities,suppressing irrelevant information and guiding the network to learn more ro-bust spatiotemporal representations.In addition,a self-distillation mechanism is employed to reinforce the network's focus on action-relevant features,incorporating both prediction distribution loss and region-level distillation loss to facilitate fine-grained feature optimization.To further improve the fusion quality,an adaptive mask fusion module is proposed,which at-tenuates the influence of uninformative regions by applying learnable masks within the multi-head self-attention and multi-layer perceptron computations.Experimental results on the InfRA and NTU RGB+D datasets demonstrate that the proposed method achieves Top-1 accuracy improvements of 3.75%and 3.49%,respectively,compared to baseline models,validating the effectiveness of the proposed framework in adaptively selecting and integrating bimodal features for improved action recognition.关键词
双模行为识别/关键帧/显著区域/自蒸馏/自适应融合Key words
bimodal action recognition/key frame/salient region/self-distillation/adaptive fusion分类
信息技术与安全科学引用本文复制引用
卿宇寒,高陈强,谭卓林,刘芳岑..基于时空自适应融合的双模行为识别[J].电子学报,2025,53(7):2389-2400,12.基金项目
国家自然科学基金(No.62176035) (No.62176035)
深圳市基础研究项目(No.JCYJ20240813151216022) National Natural Science Foundation of China(No.62176035) (No.JCYJ20240813151216022)
Shenzhen Fundamental Re-search Program(No.JCYJ20240813151216022) (No.JCYJ20240813151216022)