电子科技大学学报2024,Vol.53Issue(6):911-921,11.DOI:10.12178/1001-0548.2024156
语音驱动说话数字人视频生成方法综述
A Review on Audio-Driven Digital Human Generation Methods
摘要
Abstract
In recent years,the rapid development of deep learning technology has greatly promoted the progress of virtual digital human technology,especially in the area of audio-driven digital human video generation.Research in this field has shown broad application prospects in various scenarios such as video translation,film production,and virtual assistants.The current methods and research status of audio-driven digital human video generation are sorted out and summarized in this paper,focusing on the key technologies,datasets,and evaluation strategies.In terms of key technologies,artificial intelligence technologies such as generative adversarial networks,diffusion models,and neural radiance fields have all played an important role.The scale and diversity of datasets are crucial for model training,and the improvement of evaluation strategies helps to evaluate the generation effect more objectively.The technology of audio-driven digital human video generation will continue to face numerous challenges and opportunities.It is expected that this field can continue to innovate and develop,bringing more convenience and fun to human society.关键词
说话数字人/视频生成/生成对抗模型/扩散模型/神经辐射场/多模态融合Key words
audio-driven digital human/video generation/generative adversarial network/diffusion model/neural radiance field/multimodal fusion分类
信息技术与安全科学引用本文复制引用
刘颖,李济廷,柴瑞坤,位纪伟,杨阳..语音驱动说话数字人视频生成方法综述[J].电子科技大学学报,2024,53(6):911-921,11.基金项目
国家自然科学基金(62306067) (62306067)