计算机科学与探索2026,Vol.20Issue(3):611-624,14.DOI:10.3778/j.issn.1673-9418.2505081
语音驱动手势动作生成前沿进展
Recent Advances in Speech-Driven Gesture Generation
摘要
Abstract
In interpersonal communication,gestures enrich verbal information and facilitate information delivery.Speech-driven gesture generation aims to automatically synthesize natural,realistic,and contextually appropriate sequences of gestures conditioned on speech input.This research direction has attracted widespread attention in fields such as computer graphics and computer vision,holding significant application value in domains including film animation production,human-computer interaction,and virtual reality.Early rule-based methods suffer from inefficiency,while regression methods,despite improving generation efficiency,often result in gestures with repetitive motion patterns and limited expressive-ness.In recent years,generative models have further advanced this field,effectively enhancing the quality and diversity of generated gestures.Regarding speech-driven gesture generation methods based on generative models,this work summarizes and categorizes relevant research on generative adversarial networks,variational autoencoders,and diffusion models,analyzing their respective applications,advantages,and disadvantages in gesture generation.It further explores the con-trollability of speech-driven gesture generation in emotion expression,semantic consistency,and style transfer.Moreover,collaborative generation research combining facial expressions and gestures is discussed.Additionally,commonly used datasets and evaluation metrics are introduced,followed by experimental comparative analysis of representative methods.Finally,this paper concludes by summarizing the challenges in the field of speech-driven gesture generation and outlining future research trends.关键词
手势生成/语音驱动/生成模型/风格控制Key words
gesture generation/speech-driven/generative models/style control分类
信息技术与安全科学引用本文复制引用
张亚宇,温玉辉,张欣雨,景丽萍..语音驱动手势动作生成前沿进展[J].计算机科学与探索,2026,20(3):611-624,14.基金项目
北京市科技计划项目(Z231100005923029).This work was supported by the Science and Technology Plan Project of Beijing(Z231100005923029). (Z231100005923029)