| 注册
首页|期刊导航|计算机应用研究|基于模态语义增强的跨模态食谱检索方法

基于模态语义增强的跨模态食谱检索方法

李明 周栋 雷芳 曹步清

计算机应用研究2024,Vol.41Issue(4):1131-1137,7.
计算机应用研究2024,Vol.41Issue(4):1131-1137,7.DOI:10.19734/j.issn.1001-3695.2023.07.0350

基于模态语义增强的跨模态食谱检索方法

Cross-modal recipe retrieval method based on modality semantic enhancement

李明 1周栋 2雷芳 1曹步清1

作者信息

  • 1. 湖南科技大学计算机科学与工程学院,湖南湘潭 411100
  • 2. 广东外语外贸大学信息科学与技术学院,广州 510006
  • 折叠

摘要

Abstract

Effectively representing features of modalities is a hot issue in cross-modal recipe retrieval.The current methods generally adopt two independent neural networks to extract the features of images and recipes respectively,achieving retrieval through cross-modal alignment.However,these methods mainly focus on the intra-modal information and ignore the inter-modal interactions,resulting in the loss of some effective modality information.To address the problem,this paper proposed a cross-modal recipe retrieval method to enhance modality semantics through multimodal encoders.Firstly,it used a pre-trained model to extract initial semantic features of images and recipes and utilizing modality alignment to reduce the inter-model differences.Secondly,it employed the pairwise cross-modal attention to repeatedly reinforce the features of one modality by using features from another modality,extracted valid information.Thirdly,it used the self-attention mechanism to modal the in-ternal features of modalities to capture rich modality-specific semantic information and potential associations.Finally,it intro-duced the triplet loss to minimize the distance between similar samples,achieved cross-modal retrieval learning.Experimental results on Recipe 1M dataset show that the proposed approach outperforms the current mainstream methods in terms of median ranking(MedR)and recall rate at top K(R@K),providing a powerful solution for cross-modal retrieval tasks.

关键词

跨模态食谱检索/特征提取/模态语义增强/多模态编码器

Key words

cross-modal recipe retrieval/feature extraction/modality semantic enhancement/multimodal encoder

分类

信息技术与安全科学

引用本文复制引用

李明,周栋,雷芳,曹步清..基于模态语义增强的跨模态食谱检索方法[J].计算机应用研究,2024,41(4):1131-1137,7.

基金项目

国家自然科学基金资助项目(62376062) (62376062)

广东省哲学社会科学"十四五"规划项目(GD23CTS03) (GD23CTS03)

广东省自然科学基金资助项目(2023A1515012718) (2023A1515012718)

湖南省自然科学基金资助项目(2022JJ30020) (2022JJ30020)

教育部人文社会科学研究资助项目(23YJAZH220) (23YJAZH220)

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文