计算机技术与发展2025,Vol.35Issue(3):109-116,8.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0355
基于语义空间感知与注意力的文本生成图像方法
Semantic Spatial Awareness and Attention-based Text-to-Image Generation Method
摘要
Abstract
In the task of text image generation,there exist the phenomenon of mismatch between image and text description and the phe-nomenon of poor image generation quality.In order to improve the matching degree between text and generated images and generate higher quality generated images,a novel generative adversarial network model(WSA-GAN)is proposed.The embedding vector encoded by the word text is fused with the hidden features of the image effectively through the cross-attention method and the confidence feature fusion method.At the same time,the semantic spatial-aware convolution module(SSACN)is introduced and improved,and deep separable convolution is used to replace ordinary convolution to reduce the number of model parameters and achieve the purpose of improving the complexity of the model.Self-attention and convolution mixing(ACMix)is used to capture the relationship between each pixel in the image features,and the long-distance relationship between the features is modeled under the condition of ensuring the complexity of the model,so that the model can capture a wider range of context information,improving the alignment between the text and the generated image while improving the image quality.By verifying on CUB-200-2011 data set,compared to mainstream models,the quality of generation and the alignment with the text have both improved to some extent.关键词
生成对抗网络/多模态融合/注意力机制/文本描述生成图像/深度学习Key words
generative adversarial networks/multi-modality fusion/attention mechanism/text description generated image/deep learning分类
计算机与自动化引用本文复制引用
欧阳安杰,孙大盟,何立明..基于语义空间感知与注意力的文本生成图像方法[J].计算机技术与发展,2025,35(3):109-116,8.基金项目
陕西省重点研发计划项目(2022GY-030,2022GY-039) (2022GY-030,2022GY-039)