浙江大学学报(理学版)2026,Vol.53Issue(2):148-160,13.DOI:10.3785/1008-9497.25123
基于跨模态注意力与可微分哈希的运动-文本双向检索框架
Cross-modal attention and differentiable hashing for bidirectional motion-text retrieval framework
摘要
Abstract
The rapid development of the 3D animation,film and television production,and gaming industries has accumulated massive amounts of high-precision 3D human motion data,making efficient management and intelligent retrieval a serious challenge.In response to the two major bottlenecks in current cross modal retrieval research-insufficient modeling of the association between human motion and text semantics and high computational costs,this paper proposes a differentiable hash cross modal retrieval framework based on attention fusion.It innovatively constructs a dual channel Transformer framework to achieve feature extraction of motion capture data and natural language,captures fine-grained spatiotemporal correlations between motion sequences and text descriptions with learnable cross modal attention mechanisms,and designs an end-to-end hash encoding optimization strategy to compress high-dimensional features into compact binary streams.Experiments have shown that this method significantly improves the accuracy and efficiency of bidirectional motion text retrieval on commonly used datasets.Compared to the baseline model,the sum of recall rates has increased by 2.6 times,providing an efficient solution for motion data reuse in fields such as digital entertainment.关键词
注意力机制/哈希编码/人体运动数据/跨模态检索Key words
attention/hash code/human motion data/cross-modal retrieval分类
信息技术与安全科学引用本文复制引用
于聪睿,张璐,范波,吕娜..基于跨模态注意力与可微分哈希的运动-文本双向检索框架[J].浙江大学学报(理学版),2026,53(2):148-160,13.基金项目
国家自然科学基金项目(61802144) (61802144)
山东省自然科学基金项目(ZR2022MF352,ZR2022MF294) (ZR2022MF352,ZR2022MF294)
山东省中小企业提升计划项目(2022TSGC2160). (2022TSGC2160)