现代信息科技2025,Vol.9Issue(6):130-134,5.DOI:10.19850/j.cnki.2096-4706.2025.06.025
基于Chinese-CLIP模型和Prompt提示机制的图文检索方法
An Image-Text Retrieval Method Based on Chinese-CLIP Model and Prompt Mechanism
摘要
Abstract
In order to improve the accuracy of the image-text matching task,an image-text retrieval method based on Chinese-CLIP model and Prompt mechanism is proposed.On the one hand,the text data is preprocessed.After removing stop words and punctuation marks,the BERT model is used to extract text features.On the other hand,the Convolutional Neural Network is used to extract image features,and the obtained text and image features are serialized to achieve multi-modal feature fusion.When training the model,the Chinese-CLIP large model is used for preliminary training,and then the Prompt mechanism is introduced to fine-tune the model.The experimental results show that the proposed method effectively improves the accuracy and recall rate in the two tasks of text-to-image and image-to-text.关键词
图文检索/多模态特征融合/Chinese-CLIP模型/Prompt提示机制Key words
image-text retrieval/multi-modal feature fusion/Chinese-CLIP model/Prompt mechanism分类
信息技术与安全科学引用本文复制引用
陈道彬,张子诺,付裕彬,黎晋铭,林彬..基于Chinese-CLIP模型和Prompt提示机制的图文检索方法[J].现代信息科技,2025,9(6):130-134,5.基金项目
广西自然科学基金项目(2019GXNSFBA245056) (2019GXNSFBA245056)
大学生创新创业训练计划项目(202410596733,202410596731) (202410596733,202410596731)