信号处理2025,Vol.41Issue(2):382-398,17.DOI:10.12466/xhcl.2025.02.016
情感可控的个性化完整三维虚拟形象表情动画生成
Emotion-Controlled Personalized and Complete 3D Avatar Expression Animation Generation
摘要
Abstract
Speech-driven emotional expression animation of 3D avatars aims to generate 3D facial animations that not only feature synchronized lip movements with the input voice data but also convey a range of emotional expressions.However,owing to the limitations of 3D face prior,existing methods have some limitations in the synthesis of 3D facial animations with internal oral structures,resulting in a lack of realism in the final result.In addition,despite the advance-ments in this field,the majority of existing research predominantly focuses on the synchronization of lip movements and spoken words in 3D avatars,and insufficient attention is given to the significant role that emotional fluctuations play in shaping facial expressions.This limitation makes the generated expression animation not sufficiently natural,and the re-alism of the 3D facial animation is limited,which affects users'feelings.To solve these problems,this paper proposes an emotion-controlled personalized and complete 3D avatar expression animation generation method to generate facial animation containing a detailed representation of the inner oral structure while including a wide array of emotional ex-pressions to improve the realism of 3D facial animations.The method consists of three core modules:neutral expression animation with complete oral structure generation,expression retrieval,and expression fusion.The neutral expression animation with complete oral structure generation module first outputs a neutral expression animation sequence,which achieves cross-modal mapping from speech to a 3D facial animation sequence using an auto-regressive model based on Transformer and introduces text-driven consistency loss through cross-supervised training graph to ensure synchroniza-tion between input speech and the lip region.This paper proposes an oral structure 3D model deformation algorithm based on the landmarks of the face in this module.This algorithm enables the dynamic deformation of the oral structure model,which is then seamlessly fused with the corresponding neutral expression animation sequences.The result is a neutral expression animation sequence that includes a detailed and accurate representation of the oral structure.The ex-pression retrieval module obtains the 3D face model with emotional expression by recognizing and retrieving the emo-tion according to the input speech sequence and image of the face.The expression fusion module merges the neutral ex-pression animation,which includes the oral structure,with emotionally charged 3D face models through the deep neural network.The 3D facial expression animation generated by the fusion module not only maintains the synchronization of lip movements with the speech but also conveys a range of emotions.In addition,this paper proposes an expression tran-sition algorithm based on linear interpolation to achieve a smooth transition between different emotions on the 3D facial animation.Experimental results demonstrated the effectiveness of the proposed method.Additionally,the 3D facial ani-mation with both the oral structure and emotional representation generated with this method can maintain lip movements synchronized with the speech.Moreover,it can effectively improve the reality of 3D avatars.关键词
语音驱动/情绪驱动/三维虚拟形象/面部表情动画Key words
speech driven/emotion driven/3D avatar/expression animation分类
信息技术与安全科学引用本文复制引用
李俊沂,庞德龙,蔡明旭,周圣喻,余旻婧..情感可控的个性化完整三维虚拟形象表情动画生成[J].信号处理,2025,41(2):382-398,17.基金项目
国家自然科学基金(62002258) (62002258)
北京市自然科学基金(L222113)The National Natural Science Foundation of China(62002258) (L222113)
Beijing Natural Science Foundation(L222113) (L222113)