电子学报2025,Vol.53Issue(2):558-567,10.DOI:10.12263/DZXB.20240679
融合图像与文本特征的组合检索方法
A Combined Retrieval Method by Fusing Image and Text Features
摘要
Abstract
With the explosive growth of image data in the field of e-commerce,target image retrieval has become a challenging work in information retrieval research.The existing traditional image retrieval models only rely on a single text description or similar image,which is difficult to accurately capture the user's retrieval intention,resulting in unsatisfactory retrieval results.In order to solve this problem,this paper proposes a combined retrieval method that fuses image and text features.Swin Transformer(SwinT)is used to extract the multi-layer features of the reference image,and the image and text features are fused at multiple levels,so that the text features can modify the reference image features at multi-level and fine-grained,and get closer to the target image features.Then,the modified image features and the target image features are em-bedded in a space for similarity measurement,and the batch-based classification loss is used to optimize the retrieval perfor-mance.Experimental results on Fashion200k,MIT-States and CSS datasets show that the proposed method improves the performance by 5 percentage points on average compared with the existing mainstream methods.关键词
图像文本组合检索/图像特征/文本特征/特征融合Key words
combined image and text retrieval/image features/text features/features fusion分类
信息技术与安全科学引用本文复制引用
秦钰淑,杨良怀,朱艳超,龚卫华..融合图像与文本特征的组合检索方法[J].电子学报,2025,53(2):558-567,10.基金项目
浙江省重点研发计划"领雁"项目(No.2022C01088) Zhejiang Provincial Key Research and Development Program("Ling Yan"Project)(No.2022C01088) (No.2022C01088)