首页|期刊导航|计算机科学与探索|语音驱动手势动作生成前沿进展

语音驱动手势动作生成前沿进展

张亚宇温玉辉张欣雨景丽萍

计算机科学与探索2026，Vol.20Issue(3)：611-624,14.

计算机科学与探索2026，Vol.20Issue(3)：611-624,14.DOI:10.3778/j.issn.1673-9418.2505081

语音驱动手势动作生成前沿进展

Recent Advances in Speech-Driven Gesture Generation

张亚宇 ¹温玉辉 ¹张欣雨 ¹景丽萍¹

作者信息

1. 北京交通大学计算机科学与技术学院,北京 100044||交通数据挖掘与具身智能北京市重点实验室(北京交通大学),北京 100044
折叠

摘要

Abstract

In interpersonal communication,gestures enrich verbal information and facilitate information delivery.Speech-driven gesture generation aims to automatically synthesize natural,realistic,and contextually appropriate sequences of gestures conditioned on speech input.This research direction has attracted widespread attention in fields such as computer graphics and computer vision,holding significant application value in domains including film animation production,human-computer interaction,and virtual reality.Early rule-based methods suffer from inefficiency,while regression methods,despite improving generation efficiency,often result in gestures with repetitive motion patterns and limited expressive-ness.In recent years,generative models have further advanced this field,effectively enhancing the quality and diversity of generated gestures.Regarding speech-driven gesture generation methods based on generative models,this work summarizes and categorizes relevant research on generative adversarial networks,variational autoencoders,and diffusion models,analyzing their respective applications,advantages,and disadvantages in gesture generation.It further explores the con-trollability of speech-driven gesture generation in emotion expression,semantic consistency,and style transfer.Moreover,collaborative generation research combining facial expressions and gestures is discussed.Additionally,commonly used datasets and evaluation metrics are introduced,followed by experimental comparative analysis of representative methods.Finally,this paper concludes by summarizing the challenges in the field of speech-driven gesture generation and outlining future research trends.

关键词

手势生成/语音驱动/生成模型/风格控制

Key words

gesture generation/speech-driven/generative models/style control

分类

信息技术与安全科学

引用本文复制引用

张亚宇,温玉辉,张欣雨,景丽萍..语音驱动手势动作生成前沿进展[J].计算机科学与探索,2026,20(3):611-624,14.

基金项目

北京市科技计划项目(Z231100005923029).This work was supported by the Science and Technology Plan Project of Beijing(Z231100005923029). （Z231100005923029）

计算机科学与探索

ISSN：1673-9418

访问量1

下载量0

段落导航