| 注册
首页|期刊导航|天津科技大学学报|融合语义增强和位置编码的图文匹配方法

融合语义增强和位置编码的图文匹配方法

赵婷婷 常玉广 郭宇 陈亚瑞 王嫄

天津科技大学学报2024,Vol.39Issue(4):63-72,10.
天津科技大学学报2024,Vol.39Issue(4):63-72,10.DOI:10.13364/j.issn.1672-6510.20230177

融合语义增强和位置编码的图文匹配方法

Image-Text Matching Method Combining Semantic Enhancement and Position Encoding

赵婷婷 1常玉广 1郭宇 1陈亚瑞 1王嫄1

作者信息

  • 1. 天津科技大学人工智能学院,天津 300457
  • 折叠

摘要

Abstract

Image-text matching is one of the basic cross-modal tasks.Its core is how to accurately evaluate the similarity between image semantics and text semantics.Existing methods maximize the distinction between relevant and irrelevant distributions by introducing a correlation threshold to obtain better semantic alignment.However,for the features themselves,there is a lack of correlation between their semantics,and it is difficult to accurately align image areas and text words that lack spatial location information,which inevitably limits the learning of relevant thresholds and results in the inability to accurately align semantics.To address this problem,in this article we propose an image-text matching method that combines semantic enhancement and positional coding with adaptive correlation learnable attention.Specifically,an undirected fully connected graph of images(texts)is first constructed based on preliminary feature extraction,and graph at-tention is used to aggregate neighbor information to obtain semantically enhanced features.Then,the absolute position in-formation of the image area is encoded,and the most differentiated relevant and irrelevant distributions are obtained based on the similarity between the image area and the text words with spatial semantics,so as to better learn the optimal correlation between the two distributions.boundary.Finally,through the public datasets Flickr 30 k and MS-COCO,the effectiveness of the method proposed in this article was verified with the use of the Recall@K indicator comparison experiment.

关键词

跨模态图文匹配/图注意力/位置编码/相关性阈值

Key words

cross-modal image-text matching/graph attention/position encoding/relevance threshold

分类

信息技术与安全科学

引用本文复制引用

赵婷婷,常玉广,郭宇,陈亚瑞,王嫄..融合语义增强和位置编码的图文匹配方法[J].天津科技大学学报,2024,39(4):63-72,10.

基金项目

国家自然科学基金项目(61976156) (61976156)

天津市企业科技特派员项目(20YDTPJC00560) (20YDTPJC00560)

天津科技大学学报

1672-6510

访问量0
|
下载量0
段落导航相关论文