首页|期刊导航|电子科技大学学报|语音驱动说话数字人视频生成方法综述

语音驱动说话数字人视频生成方法综述

刘颖李济廷柴瑞坤位纪伟杨阳

电子科技大学学报2024，Vol.53Issue(6)：911-921,11.

电子科技大学学报2024，Vol.53Issue(6)：911-921,11.DOI:10.12178/1001-0548.2024156

语音驱动说话数字人视频生成方法综述

A Review on Audio-Driven Digital Human Generation Methods

刘颖 ¹李济廷 ¹柴瑞坤 ²位纪伟 ²杨阳²

作者信息

1. 军事科学院军队政治工作研究院,北京 100166
2. 电子科技大学计算机科学与工程学院,成都 611731
折叠

摘要

Abstract

In recent years,the rapid development of deep learning technology has greatly promoted the progress of virtual digital human technology,especially in the area of audio-driven digital human video generation.Research in this field has shown broad application prospects in various scenarios such as video translation,film production,and virtual assistants.The current methods and research status of audio-driven digital human video generation are sorted out and summarized in this paper,focusing on the key technologies,datasets,and evaluation strategies.In terms of key technologies,artificial intelligence technologies such as generative adversarial networks,diffusion models,and neural radiance fields have all played an important role.The scale and diversity of datasets are crucial for model training,and the improvement of evaluation strategies helps to evaluate the generation effect more objectively.The technology of audio-driven digital human video generation will continue to face numerous challenges and opportunities.It is expected that this field can continue to innovate and develop,bringing more convenience and fun to human society.

关键词

说话数字人/视频生成/生成对抗模型/扩散模型/神经辐射场/多模态融合

Key words

audio-driven digital human/video generation/generative adversarial network/diffusion model/neural radiance field/multimodal fusion

分类

信息技术与安全科学

引用本文复制引用

刘颖,李济廷,柴瑞坤,位纪伟,杨阳..语音驱动说话数字人视频生成方法综述[J].电子科技大学学报,2024,53(6):911-921,11.

基金项目

国家自然科学基金(62306067) （62306067）

电子科技大学学报

OA北大核心CSTPCD

ISSN：1001-0548

访问量0

下载量0

段落导航