首页|期刊导航|计算机科学与探索|面向扩散大模型的多模态人脸生成方法

面向扩散大模型的多模态人脸生成方法

黄万鑫任英杰芦天亮杨刚袁梦娇曾高俊

计算机科学与探索2025，Vol.19Issue(10)：2815-2830,16.

计算机科学与探索2025，Vol.19Issue(10)：2815-2830,16.DOI:10.3778/j.issn.1673-9418.2409063

面向扩散大模型的多模态人脸生成方法

Multimodal Face Generation Method for Diffusion Large Models

黄万鑫 ¹任英杰 ²芦天亮 ¹杨刚 ³袁梦娇 ³曾高俊³

作者信息

1. 中国人民公安大学信息网络安全学院,北京 100038||中国人民公安大学公安行业大模型研究与应用实验室,北京 100038
2. 公安部网络安全保卫局,北京 100741
3. 中国人民公安大学信息网络安全学院,北京 100038
折叠

摘要

Abstract

Face generation represents a cutting-edge topic within the field of computer vision,boasting broad application prospects in areas such as criminal investigation and virtual reality.Recently,diffusion models have exhibited outstanding generative capabilities,capable of producing images with high semantic consistency under specific conditions.Their appli-cation in facial generation has become a new trend.However,existing methods based on conventional diffusion inade-quately handle the details of conditional information,failing to fully exploit such information for the precise generation of faces.Methods based on large diffusion models typically require significant computational resources to fine-tune the model or necessitate the addition of extra complex networks without achieving a balanced integration of multimodal conditional information.To address these challenges,this study proposes a multimodal generative facial method for diffusion-based large models called MA-adapter.By incorporating a compact auxiliary network to extract visual structural information and integrate semantic guidance,the method aims to harness the generative capabilities of diffusion large models while avoiding the substantial computational resources required for fine-tuning.This model first enhances image modality prompts using a multi-head attention module,focusing more on key information.Subsequently,it extracts multi-scale feature in-formation through a multi-scale feature module,providing guarantees for precise generation guidance.Finally,an adaptive adjustment mechanism is designed to adaptively adjust the generation guidance coefficients of different feature layers to achieve better performance.Experimental results on the MM-CelebA-HQ(multi-modal-CelebA-HQ)dataset show that com-pared with the current state-of-the-art method T2I-adapter,the perceptual similarity metric LPIPS of the MA-adapter decreases by approximately 18.4%,the image-text matching metric CLIP-Score increases by about 13.6%,and the feature similarity metric CLIP-I grows by approximately 14.8%.Extensive experimental results fully validate the effectiveness and superiority of the MA-adapter.

关键词

人脸生成/多模态/扩散模型/智能生成/注意力机制

Key words

face generation/multimodal/diffusion model/intelligent generation/attention mechanism

分类

信息技术与安全科学

引用本文复制引用

黄万鑫,任英杰,芦天亮,杨刚,袁梦娇,曾高俊..面向扩散大模型的多模态人脸生成方法[J].计算机科学与探索,2025,19(10):2815-2830,16.

基金项目

中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07) （2023SYL07）

中央高校基本科研业务费项目(2022JKF02022).This work was supported by the Double First-Class Innovation Research Project for People·s Public Security University of China(2023SYL07),and the Fundamental Research Funds for the Central Universities of Ministry of Education of China(2022JKF02022). （2022JKF02022）

计算机科学与探索

OA北大核心

ISSN：1673-9418

访问量0

下载量0

段落导航