燕山大学学报2024,Vol.48Issue(5):446-455,10.DOI:10.3969/j.issn.1007-791X.2024.05.007
局部-全局特征引导的图文多级关系分析与挖掘方法
Analysis and mining method of multi-level relations between image and text guided by local-global features
摘要
Abstract
Text and image data with semantic relevance can enhance semantic understanding from different perspectives due to their complementarity.Therefore,the key to make full use of image and text data lies in the mining of semantic relations between image and text.In order to solve the problems of insufficient mining of deep semantic relations of image and text data and inaccurate prediction in retrieval stage,an analysis and mining method of multi-level relations between image and text guided by local-global features is proposed in this paper.Transformer with multi-head self-attention mechanism is used to model image relations.By constructing an image-guided text attention module,the fine-grained relationship between image region and global text is explored.Furthermore,the local and global features are fused to effectively enhance the semantic relationship between image and text data.To verify the proposed method,the experiments were carried out on the data sets of Flickr30K,MSCOCO-1K and MSCOCO-3K.Compared with 12 other methods such as VSM and SGRAF,the recall rate of searching for image by text in this method has increased by 0.62%on average,and the recall rate of searching for text by image has increased by 0.5%on average.The experimental results well verify the effectiveness of this method.关键词
图文关系挖掘/多头自注意力机制/局部-全局特征Key words
image and text relation mining/multi-headed self-attention mechanism/local-global features分类
信息技术与安全科学引用本文复制引用
王海荣,郭瑞萍,徐玺,周北京..局部-全局特征引导的图文多级关系分析与挖掘方法[J].燕山大学学报,2024,48(5):446-455,10.基金项目
宁夏回族自治区教育厅高等学校科学研究重点项目(NYG2022051) (NYG2022051)
宁夏自然科学基金资助项目(2023AAC03316) (2023AAC03316)