| 注册
首页|期刊导航|计算机科学与探索|面向扩散大模型的多模态人脸生成方法

面向扩散大模型的多模态人脸生成方法

黄万鑫 任英杰 芦天亮 杨刚 袁梦娇 曾高俊

计算机科学与探索2025,Vol.19Issue(10):2815-2830,16.
计算机科学与探索2025,Vol.19Issue(10):2815-2830,16.DOI:10.3778/j.issn.1673-9418.2409063

面向扩散大模型的多模态人脸生成方法

Multimodal Face Generation Method for Diffusion Large Models

黄万鑫 1任英杰 2芦天亮 1杨刚 3袁梦娇 3曾高俊3

作者信息

  • 1. 中国人民公安大学 信息网络安全学院,北京 100038||中国人民公安大学 公安行业大模型研究与应用实验室,北京 100038
  • 2. 公安部 网络安全保卫局,北京 100741
  • 3. 中国人民公安大学 信息网络安全学院,北京 100038
  • 折叠

摘要

Abstract

Face generation represents a cutting-edge topic within the field of computer vision,boasting broad application prospects in areas such as criminal investigation and virtual reality.Recently,diffusion models have exhibited outstanding generative capabilities,capable of producing images with high semantic consistency under specific conditions.Their appli-cation in facial generation has become a new trend.However,existing methods based on conventional diffusion inade-quately handle the details of conditional information,failing to fully exploit such information for the precise generation of faces.Methods based on large diffusion models typically require significant computational resources to fine-tune the model or necessitate the addition of extra complex networks without achieving a balanced integration of multimodal conditional information.To address these challenges,this study proposes a multimodal generative facial method for diffusion-based large models called MA-adapter.By incorporating a compact auxiliary network to extract visual structural information and integrate semantic guidance,the method aims to harness the generative capabilities of diffusion large models while avoiding the substantial computational resources required for fine-tuning.This model first enhances image modality prompts using a multi-head attention module,focusing more on key information.Subsequently,it extracts multi-scale feature in-formation through a multi-scale feature module,providing guarantees for precise generation guidance.Finally,an adaptive adjustment mechanism is designed to adaptively adjust the generation guidance coefficients of different feature layers to achieve better performance.Experimental results on the MM-CelebA-HQ(multi-modal-CelebA-HQ)dataset show that com-pared with the current state-of-the-art method T2I-adapter,the perceptual similarity metric LPIPS of the MA-adapter decreases by approximately 18.4%,the image-text matching metric CLIP-Score increases by about 13.6%,and the feature similarity metric CLIP-I grows by approximately 14.8%.Extensive experimental results fully validate the effectiveness and superiority of the MA-adapter.

关键词

人脸生成/多模态/扩散模型/智能生成/注意力机制

Key words

face generation/multimodal/diffusion model/intelligent generation/attention mechanism

分类

信息技术与安全科学

引用本文复制引用

黄万鑫,任英杰,芦天亮,杨刚,袁梦娇,曾高俊..面向扩散大模型的多模态人脸生成方法[J].计算机科学与探索,2025,19(10):2815-2830,16.

基金项目

中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07) (2023SYL07)

中央高校基本科研业务费项目(2022JKF02022).This work was supported by the Double First-Class Innovation Research Project for People·s Public Security University of China(2023SYL07),and the Fundamental Research Funds for the Central Universities of Ministry of Education of China(2022JKF02022). (2022JKF02022)

计算机科学与探索

OA北大核心

1673-9418

访问量0
|
下载量0
段落导航相关论文