| 注册
首页|期刊导航|计算机工程与科学|耦合单词与句子级文本特征的图像对抗级联生成

耦合单词与句子级文本特征的图像对抗级联生成

白志远 杨智翔 栾鸿康 孙玉宝

计算机工程与科学2023,Vol.45Issue(12):2186-2196,11.
计算机工程与科学2023,Vol.45Issue(12):2186-2196,11.DOI:10.3969/j.issn.1007-130X.2023.12.010

耦合单词与句子级文本特征的图像对抗级联生成

Image adversarial cascade generation via coupling word and sentence-level text features

白志远 1杨智翔 1栾鸿康 1孙玉宝1

作者信息

  • 1. 南京信息工程大学计算机学院,江苏 南京 210044||南京信息工程大学计算机学院江苏省大数据分析技术实验室,江苏 南京 210044
  • 折叠

摘要

Abstract

Text-to-image generation aims to generate realistic images from natural language descrip-tions,and is a cross-modal analysis task involving text and images.In view of the fact that the genera-tive confrontation network has the advantages of realistic image generation and high efficiency,it has be-come the mainstream model for text generation image tasks.However,the current methods often divide text features into word-level and sentence-level training separately,and the text information is not fully utilized,which easily leads to the problem that the generated image does not match the text.In response to this problem,this paper proposes an image confrontation cascade generation model(Union-GAN)that couples word-level and sentence-level text features,and introduces a text-image joint perception module(Union-Block)in each image generation stage.By combining channel affine transformation and cross-modal attention,it fully utilizes the word-level semantic and overall semantic information of the text to generate images that not only match the text semantic description but also maintain clear struc-tures.Meanwhile,jointly optimizing the discriminator and adding spatial attention to the corresponding discriminator allows the supervisory signal from the text to prompt the generator to generate more rele-vant images.Compared with multiple current representative networks such as AttnGAN on the CUB-200-2011 dataset,experimental results show that the FID score of our Union-GAN is 13.67,an in-crease of 42.9%compared to AttnGAN,and the IS score is 4.52,an increase of 0.16.

关键词

文本生成图像/生成对抗网络/多模态任务

Key words

text-to-image generation/generative adversarial network(GAN)/multimodal task

分类

信息技术与安全科学

引用本文复制引用

白志远,杨智翔,栾鸿康,孙玉宝..耦合单词与句子级文本特征的图像对抗级联生成[J].计算机工程与科学,2023,45(12):2186-2196,11.

基金项目

国家自然科学基金(U2001211,62276139) (U2001211,62276139)

计算机工程与科学

OA北大核心CSCDCSTPCD

1007-130X

访问量0
|
下载量0
段落导航相关论文