测试技术学报2024,Vol.38Issue(2):154-160,7.DOI:10.3969/j.issn.1671-7449.2024.02.008
基于CLIP模型和文本重建的人脸图像生成方法研究
Research on Face Image Generation Method Based on CLIP Model and Text Reconstruction
摘要
Abstract
To address the problems of inconsistency between generated images and text descriptions and low image resolution in text-generated face methods,this paper proposes a cross-modal text-generated face image network framework.Firstly,the CLIP pre-training model is adopted to extract features from the text,and the text semantic features are enhanced by the conditional enhancement module to generate hidden vectors;then the hidden vector is projected into the implicit space of the pre-trained model Style-GAN by the mapping network to obtain the untangled hidden vector,which is input to the StyleGAN gen-erator to generate high-resolution face images;finally,the text reconstruction module is adopted to regen-erate the face images into text,and the semantic alignment loss between the reconstructed text and the input text is calculated and utilized as semantic supervision to guide the network training.The training and testing are performed on two datasets,Multi-Modal CelebA-HQ and CelebAText-HQ,and the experi-mental results show that compared with other methods,the method in this paper can generate high-resolution face images that are more consistent with the text description.关键词
文本生成人脸/跨模态/CLIP预训练/文本重建/文本映射Key words
text-generated face/cross-modality/CLIP pre-training/text reconstruction/text mapping分类
信息技术与安全科学引用本文复制引用
李源凡,张丽红..基于CLIP模型和文本重建的人脸图像生成方法研究[J].测试技术学报,2024,38(2):154-160,7.基金项目
山西省高等学校教学改革创新项目(J2021086) (J2021086)
山西省研究生创新项目(2021Y154) (2021Y154)