南京大学学报(自然科学版)2024,Vol.60Issue(5):804-814,11.DOI:10.13232/j.cnki.jnju.2024.05.011
基于双向语义嵌入的细粒度图文匹配方法
Bidirectional semantic embedding for fine-grained image-text matching
摘要
Abstract
Image-text matching aims to achieve high-quality semantic alignment between images and texts,which is an important task in the cross-disciplinary field of computer vision and natural language processing.Images and texts are two distinct mediums for conveying information.However,their differences in the content and distribution lead to uncertainty and ambiguity in fine-grained cross-modal information correlation.To address the challenges and enhance fine-grained alignment between images and texts,BSEM-Net(Bidirectional Semantic Embedding for Fine-Grained Image-Text Matching)is proposed.Firstly,in order to reduce redundancy in image information,this paper introduces IE(Image Semantic Embedding Module)that utilizes text words as supervisory signals to guide the model in constraining the expression of irrelevant image regions.Secondly,to reduce the distribution differences between modalities and establish fine-grained semantic alignment,this paper introduces TE(Text Semantic Embedding Module)that utilizes image regions to select words and transform these words into phrases that exhibit a similar information distribution to the image regions.In addition,the two modules utilize region relationship connectivity graphs and phrase relationship connectivity graphs to mine contextual information between intra modal features,reducing semantic divergence.Experimental comparisons are conducted on publicly available cross-modal retrieval datasets Flickr30k and MSCOCO,and the results demonstrate that the proposed method has significant superiority over existing methods in image-text matching tasks.关键词
图文匹配/跨模态/语义嵌入/细粒度信息关联/语义对齐Key words
image-text matching/cross modal/semantic embedding/fine-grained information association/semantic alignment分类
信息技术与安全科学引用本文复制引用
尹晶晶,潘丽丽,王朝,熊思宇,瞿栋梁..基于双向语义嵌入的细粒度图文匹配方法[J].南京大学学报(自然科学版),2024,60(5):804-814,11.基金项目
湖南省教育厅科学研究重点项目(22A0195),湖南省教育厅教学改革研究项目(HNJG-20230471) (22A0195)