陕西科技大学学报2025,Vol.43Issue(4):183-191,9.
基于改进的Vision Transformer深度哈希图像检索
Deep hashing method based on improved Vision Transformer
摘要
Abstract
To solve the problem that the deep hashing method based on convolutional neural network cannot well capture the global image information and the imbalance of difficult and easy samples,positive and negative sample pairs in the datasets,this paper proposed an im-proved deep hashing method based on Vision Transformer called CMTH.Firstly,CMTH uti-lized the convolutional neural networks to extract deep local features before the Transformer encoder network,reduce dimensionality,and keep image resolution.Secondly,the improved Vision Transformer network used a lightweight Multi-head self-attention module to extract high-dimensional deep global features while reducing computational complexity.Finally,a new loss framework is proposed to design a normalized focal loss to adjust the weight of hard samples and to construct a new hash loss to reduce the impact of imbalance between easy and hard samples,as well as the imbalance between positive and negative samples.Compared to the deep hashing suboptimal algorithm based on Vision Transformer,the mean Average Pre-cision on CIFAR-10 and NUS-WIDE improved by an average of 2.35%and 3.75%,respec-tively,across four different bit settings.关键词
深度哈希/卷积神经网络/视觉注意力/图像检索Key words
deep hashing/convolutional neural network/Vision Transformer/image retrieval分类
信息技术与安全科学引用本文复制引用
杨梦雅,赵琰,薛亮..基于改进的Vision Transformer深度哈希图像检索[J].陕西科技大学学报,2025,43(4):183-191,9.基金项目
国家自然科学基金项目(62105196) (62105196)