计算机工程与科学2023,Vol.45Issue(12):2186-2196,11.DOI:10.3969/j.issn.1007-130X.2023.12.010
耦合单词与句子级文本特征的图像对抗级联生成
Image adversarial cascade generation via coupling word and sentence-level text features
摘要
Abstract
Text-to-image generation aims to generate realistic images from natural language descrip-tions,and is a cross-modal analysis task involving text and images.In view of the fact that the genera-tive confrontation network has the advantages of realistic image generation and high efficiency,it has be-come the mainstream model for text generation image tasks.However,the current methods often divide text features into word-level and sentence-level training separately,and the text information is not fully utilized,which easily leads to the problem that the generated image does not match the text.In response to this problem,this paper proposes an image confrontation cascade generation model(Union-GAN)that couples word-level and sentence-level text features,and introduces a text-image joint perception module(Union-Block)in each image generation stage.By combining channel affine transformation and cross-modal attention,it fully utilizes the word-level semantic and overall semantic information of the text to generate images that not only match the text semantic description but also maintain clear struc-tures.Meanwhile,jointly optimizing the discriminator and adding spatial attention to the corresponding discriminator allows the supervisory signal from the text to prompt the generator to generate more rele-vant images.Compared with multiple current representative networks such as AttnGAN on the CUB-200-2011 dataset,experimental results show that the FID score of our Union-GAN is 13.67,an in-crease of 42.9%compared to AttnGAN,and the IS score is 4.52,an increase of 0.16.关键词
文本生成图像/生成对抗网络/多模态任务Key words
text-to-image generation/generative adversarial network(GAN)/multimodal task分类
信息技术与安全科学引用本文复制引用
白志远,杨智翔,栾鸿康,孙玉宝..耦合单词与句子级文本特征的图像对抗级联生成[J].计算机工程与科学,2023,45(12):2186-2196,11.基金项目
国家自然科学基金(U2001211,62276139) (U2001211,62276139)