首页|期刊导航|南京大学学报（自然科学版）|基于双向语义嵌入的细粒度图文匹配方法

基于双向语义嵌入的细粒度图文匹配方法

尹晶晶潘丽丽王朝熊思宇瞿栋梁

南京大学学报（自然科学版）2024，Vol.60Issue(5)：804-814,11.

南京大学学报（自然科学版）2024，Vol.60Issue(5)：804-814,11.DOI:10.13232/j.cnki.jnju.2024.05.011

基于双向语义嵌入的细粒度图文匹配方法

Bidirectional semantic embedding for fine-grained image-text matching

尹晶晶 ¹潘丽丽 ¹王朝 ¹熊思宇 ¹瞿栋梁¹

作者信息

1. 中南林业科技大学电子信息与物理学院,长沙,410004
折叠

摘要

Abstract

Image-text matching aims to achieve high-quality semantic alignment between images and texts,which is an important task in the cross-disciplinary field of computer vision and natural language processing.Images and texts are two distinct mediums for conveying information.However,their differences in the content and distribution lead to uncertainty and ambiguity in fine-grained cross-modal information correlation.To address the challenges and enhance fine-grained alignment between images and texts,BSEM-Net(Bidirectional Semantic Embedding for Fine-Grained Image-Text Matching)is proposed.Firstly,in order to reduce redundancy in image information,this paper introduces IE(Image Semantic Embedding Module)that utilizes text words as supervisory signals to guide the model in constraining the expression of irrelevant image regions.Secondly,to reduce the distribution differences between modalities and establish fine-grained semantic alignment,this paper introduces TE(Text Semantic Embedding Module)that utilizes image regions to select words and transform these words into phrases that exhibit a similar information distribution to the image regions.In addition,the two modules utilize region relationship connectivity graphs and phrase relationship connectivity graphs to mine contextual information between intra modal features,reducing semantic divergence.Experimental comparisons are conducted on publicly available cross-modal retrieval datasets Flickr30k and MSCOCO,and the results demonstrate that the proposed method has significant superiority over existing methods in image-text matching tasks.

关键词

图文匹配/跨模态/语义嵌入/细粒度信息关联/语义对齐

Key words

image-text matching/cross modal/semantic embedding/fine-grained information association/semantic alignment

分类

信息技术与安全科学

引用本文复制引用

尹晶晶,潘丽丽,王朝,熊思宇,瞿栋梁..基于双向语义嵌入的细粒度图文匹配方法[J].南京大学学报（自然科学版）,2024,60(5):804-814,11.

基金项目

湖南省教育厅科学研究重点项目(22A0195),湖南省教育厅教学改革研究项目(HNJG-20230471) （22A0195）

南京大学学报（自然科学版）

OA北大核心CSTPCD

ISSN：0469-5097

访问量5

下载量0

段落导航