南京信息工程大学学报2026,Vol.18Issue(2):192-201,10.DOI:10.13878/j.cnki.jnuist.20250424001
基于改进稳定扩散模型与噪声拼接的文本生成图像算法
Text-to-image generation algorithm based on improved stable diffusion model and noise stitching
摘要
Abstract
To address the problems in text-to-image generation,such as missing features,low quality output,and layout attribute mismatches,this paper proposes ISD-NC,an algorithm based on Improved Stable Diffusion and NoiseCollage.First,a discriminator is introduced to maximize the mutual information between the latent representa-tion and the shallow features,thereby enhancing their similarity and preserving original image information.Second,based on the functions of the backbone network and skip connections,scale factors are incorporated to dynamically adjust the weight ratio of features to improve the generated image quality.Finally,by combining with the NoiseCol-lage network,layout conditions are incorporated,enabling the generation of images from complex multi-objective text conditions through a mask cross-attention mechanism.Qualitative and quantitative analyses,along with ablation studies,were conducted on the MS COCO dataset to compare the proposed ISD-NC against methods such as Cog-view,DF-GAN,Stable Diffusion,and KNN-diffusion.Experimental results demonstrate that ISD-NC generates ima-ges with superior detail fidelity and overall quality.Compared to diffusion-based models like Stable Diffusion and KNN-diffusion,ISD-NC reduces Frechet Inception Distance(FID)by an average of 28.99%and increases Incep-tion Score(IS)by an average of 10.21%.关键词
扩散模型/文本生成图像/噪声拼接/互信息/主干网络特征/跳跃连接特征Key words
diffusion model/text-to-image generation/noise stitching/mutual information/backbone network fea-ture/skip connection feature分类
信息技术与安全科学引用本文复制引用
李文瑶,杜洪波,张琪..基于改进稳定扩散模型与噪声拼接的文本生成图像算法[J].南京信息工程大学学报,2026,18(2):192-201,10.基金项目
辽宁省科技计划联合计划项目(2025-MSLH-355) (2025-MSLH-355)