西安电子科技大学学报(自然科学版)2024,Vol.51Issue(4):128-138,11.DOI:10.19665/j.issn1001-2400.20240302
图文跨模态检索的联合特征方法
Joint feature approach for image-text cross-modal retrieval
摘要
Abstract
With the rapid development of deep learning,cross-modal retrieval performance has been significantly improved.However,existing methods only match the image text as a whole or only use local information for matching,there are limitations in the use of graphic and textual information,and the retrieval performance needs to be further improved.In order to fully exploit the potential semantic relationship between images and texts,this paper proposes a cross-modal retrieval model based on joint features.In the feature extraction part,two sub-networks are used to deal with the local features and global features of images and texts respectively,and a bilinear layer structure based on the attention mechanism is designed to filter redundant information.In the loss function part,the triplet ranking loss and semantic label classification loss are used to realize feature joint optimization.And the proposed model has a wide range of generality,which can effectively improve the performance of the model only based on local information.A series of experimental results on the public datasets Flickr30k and MS COCO show that the proposed model effectively improves the performance of cross-modal image-text retrieval tasks.In the Flickr30k dataset retrieval task,the proposed model improves 5.1% on the R@1 metric for text retrieval and 2.8% on the R@1 metric for image retrieval.关键词
跨模态检索/深度学习/自注意力网络/图像检索Key words
cross-modal retrieval/deep learning/self-attention network/image retrieval分类
信息技术与安全科学引用本文复制引用
高迪辉,盛立杰,许小冬,苗启广..图文跨模态检索的联合特征方法[J].西安电子科技大学学报(自然科学版),2024,51(4):128-138,11.基金项目
国家自然科学基金(62272364) (62272364)
陕西高等继续教育教学改革研究课题(21XJZ004) (21XJZ004)