首页|期刊导航|食品科学|基于增强视觉Transformer的哈希食品图像检索

基于增强视觉Transformer的哈希食品图像检索

曹品丹闵巍庆宋佳骏盛国瑞杨延村王丽丽蒋树强

食品科学2024，Vol.45Issue(10)：1-8,8.

食品科学2024，Vol.45Issue(10)：1-8,8.DOI:10.7506/spkx1002-6630-20231231-270

基于增强视觉Transformer的哈希食品图像检索

Hash Food Image Retrieval Based on Enhanced Vision Transformer

曹品丹 ¹闵巍庆 ²宋佳骏 ³盛国瑞 ¹杨延村 ¹王丽丽 ¹蒋树强²

作者信息

1. 鲁东大学信息与电气工程学院,山东烟台 264025
2. 中国科学院计算技术研究所,北京 100190
3. 中国人民大学农业与农村发展学院,北京 100872
折叠

摘要

Abstract

Food image retrieval,a major task in food computing,has garnered extensive attention in recent years.However,it faces two primary challenges.First,food images exhibit fine-grained characteristics,implying that visual differences between different food categories may be subtle and often can only be observable in local regions of the image.Second,food images contain abundant semantic information,such as ingredients and cooking methods,whose extraction and utilization are crucial for enhancing the retrieval performance.To address these issues,this paper proposes an enhanced ViT hash network(EVHNet)based on a pre-trained Vision Transformer(ViT)model.Given the fine-grained nature of food images,a local feature enhancement module enabling the network to learn more representative features was designed in EVHNet based on convolutional structure.To better leverage the semantic information in food images,an aggregated semantic feature module aggregating the information based on class token features was designed in EVHNet.The proposed EVHNet model was evaluated under three popular hash image retrieval frameworks,namely greedy hash(GreedyHash),central similarity quantization(CSQ),and deep polarized network(DPN),and compared with four mainstream network models,AlexNet,ResNet50,ViT-B_32,and ViT-B_16.Experimental results on the Food-101,Vireo Food-172,and UEC Food-256 food datasets demonstrated that the EVHNet model outperformed other models in terms of comprehensive retrieval accuracy.

关键词

食品图像检索/食品计算/哈希检索/Vision Transformer网络/深度哈希学习

Key words

food image retrieval/food computing/hash retrieval/Vision Transformer network/deep hash learning

分类

农业科技

引用本文复制引用

曹品丹,闵巍庆,宋佳骏,盛国瑞,杨延村,王丽丽,蒋树强..基于增强视觉Transformer的哈希食品图像检索[J].食品科学,2024,45(10):1-8,8.

基金项目

国家自然科学基金青年科学基金项目(61705098) （61705098）

国家自然科学基金面上项目(61872170) （61872170）

山东省自然科学基金项目(ZR2023MF031) （ZR2023MF031）

食品科学

OA北大核心CSTPCD

ISSN：1002-6630

访问量0

下载量0

段落导航