电子科技2025,Vol.38Issue(10):19-26,8.DOI:10.16180/j.cnki.issn1007-7820.2025.10.003
基于扩散模型结合改进后变分自编码器的文本图像生成算法
Text Image Generation Algorithm Based on Diffusion Model Combined with Improved Variational Auto-Encoder
摘要
Abstract
In view of the problems of slow sampling speed and high computing power of the current diffusion model,a novel text image generation model RDM(Rapid Diffusion Model)is proposed based on the current main-stream algorithms.Using the VAE(Variational Auto-Encoder)to learn the latent features of the sample data and train the data in the latent space can not only reduce the computational complexity of the model,but also effectively retain the detailed features of the image,and pay different attention to different image channels.Text features are ex-tracted using the text encoder in CLIP as text conditions for the algorithm.Qualitative and quantitative analysis exper-iments are carried out on FFHQ(Flickr-Faces-Hight-Quality)and CelebA-HQ for Rapid Diffusion model.The e-valuation indexes of FID(Fréchet Inception Distance)and IS(Inception Score)and the generation results show that the proposed algorithm is superior to ControlGAN(Generative Adversarial Network)and DDPM(Denoising Diffusion Probabilistic Model)text image generation algorithms,and the generated images have richer light and shadow details and color matching,while reducing the time complexity.关键词
文本图像生成/扩散模型/变分自编码器/CLIP/深度学习/高斯噪声/图像处理/神经网络Key words
text image generation/diffusion model/variational auto-encoder/CLIP/deep learning/Gaussian noise/image processing/neural network分类
信息技术与安全科学引用本文复制引用
薛皓元,杜洪波,朱立军..基于扩散模型结合改进后变分自编码器的文本图像生成算法[J].电子科技,2025,38(10):19-26,8.基金项目
国家自然科学基金(11861003) (11861003)
辽宁省教育厅高等学校基本科研项目(LJKZ0157)National Natural Science Foundation of China(11861003) (LJKZ0157)
Basic Scientific Research Project of Higher Education Department of Liaoning(LJKZ0157) (LJKZ0157)