首页|期刊导航|软件导刊|基于多模态联合嵌入的共语手势生成

基于多模态联合嵌入的共语手势生成

文吾琦杜小勤

软件导刊2025，Vol.24Issue(7)：38-45,8.

软件导刊2025，Vol.24Issue(7)：38-45,8.DOI:10.11907/rjdk.241406

基于多模态联合嵌入的共语手势生成

Co-Speech Gesture Generation Based on Multimodal Joint Embedding

文吾琦 ¹杜小勤¹

作者信息

1. 武汉纺织大学计算机与人工智能学院,湖北武汉 430200
折叠

摘要

Abstract

Co-speech gesture generation refers to the technology of generating gestures that match the audio and text.To address the issues in this field,such as the underutilization of information in gesture data,the poor structural organization of multimodal joint embedding space,and the instability of adversarial training,a gesture generation method based on multimodal joint embedding encoding is proposed.This meth-od uses a multi-level,multi-scale gesture encoder to extract low and high-level features from gesture data and aligns them with audio and text features.A quadruplet contrastive learning strategy is designed,which performs imprecise alignment while simultaneously constraining the dis-tances between multimodal positive and negative samples.Finally,an asymmetric adversarial training method based on WGAN is proposed,which differentiates the training objectives of the generator and discriminator.Experiments show that this method achieves FGD scores of 0.901 and 1.667 on two gesture datasets respectively;comparative experiments and ablation experiments further prove the effectiveness of the pro-posed method.

关键词

共语手势/联合嵌入编码/表征学习/对比学习/生成式对抗网络

Key words

co-speech gestures/joint embedding encoding/representation learning/contrastive learning/generative adversarial network

分类

信息技术与安全科学

引用本文复制引用

文吾琦,杜小勤..基于多模态联合嵌入的共语手势生成[J].软件导刊,2025,24(7):38-45,8.

软件导刊

ISSN：1672-7800

访问量0

下载量0

段落导航