首页|期刊导航|计算机工程|深度语义关联学习的基于图像视觉数据跨域检索

深度语义关联学习的基于图像视觉数据跨域检索OA北大核心CSTPCD

Image-Based Cross-Domain Visual-Data Retrieval with Deep Semantic Correlation Learning

中文摘要

英文摘要

基于图像的视觉数据跨域检索任务旨在搜索与输入图像在语义上一致或外形上相似的跨域图像和三维模型数据,其面临的主要问题是处理跨域数据之间的模态异质性.现有方法通过构建公共特征空间,采用域适应算法或深度度量学习算法实现跨域特征的域对齐或语义对齐,其有效性仅在单一类型的跨域检索任务中进行了验证.提出一种基于深度语义关联学习的方法,以适用多种类型的基于图像的跨域视觉数据检索任务.首先,使用异构网络提取跨域数据的初始视觉特征;然后,通过构建公共特征空间实现初始特征映射,以便进行后续的域对齐和语义对齐;最后,通过域内鉴别性学习、域间一致性学习和跨域相关性学习,消除跨域数据特征之间的异质性,探索跨域数据特征之间的语义相关性,并为检索任务生成鲁棒且统一的特征表示.实验结果表明,该方法在TU-Berlin、IM2MN和MI3DOR数据集中的平均精度均值(mAP)分别达到0.448、0.689和0.874,明显优于对比方法.

Image-based cross-domain retrieval of visual data is performed to identify cross-domain images and three-dimensional model data that are semantically consistent with or similar in appearance to an input image.In this task,the modal heterogeneity between cross-domain data must be addressed to achieve cross-domain correspondence between the query images and target objects.Existing methods achieve domain or semantic alignment of cross-domain features by constructing a common feature space and using a domain-adaptation or depth metric algorithm.The effectiveness of these methods has only been verified in a single type of cross-domain retrieval task.To address the above issues,a method based on deep semantic correlation learning is proposed for many types of image-based cross-domain visual-data retrieval tasks.First,heterogeneous networks are used to extract the original visual features of cross-domain data.Subsequently,a common feature space is constructed to map the original features for subsequent domain and semantic alignments.Finally,intra-modal discrimination learning,inter-modal consistency learning,and cross-modal correlation learning are performed to eliminate the heterogeneity among cross-domain features,determine the semantic relevance among cross-domain data features,and generate robust and uniform feature representations for retrieval tasks.Experimental results show that the mean Average Precision(mAP)values of this method on the TU-Berlin,IM2MN,and MI3DOR datasets are 0.448,0.689,and 0.874,respectively,significantly better than comparative methods.

作者：焦世超;关日鹏;况立群;熊风光;韩燮

作者单位：中北大学计算机科学与技术学院,山西太原 030051||机器视觉与虚拟现实山西省重点实验室,山西太原 030051||山西省视觉信息处理及智能机器人工程研究中心,山西太原 030051

分类：计算机与自动化

中文关键词：跨域检索特征对齐域对齐草图真实图像三维模型相关性学习

英文关键词：cross-domain retrievalfeature alignmentdomain alignmentsketchreal imagethree-dimensional modelcorrelation learning

刊名：《计算机工程》 2024 (005)

相关项目：多源异构数据流感预测模型及实证研究

页码/页数：190-199 / 10

基金：国家自然科学基金(62272426,62106238);山西省科技重大专项计划"揭榜挂帅"项目(202201150401021);山西省科技成果转化引导专项(202104021301055);山西省回国留学人员科研项目(2020-113);山西省基础研究计划(202203021222027).

DOI：10.19678/j.issn.1000-3428.0067501

深度语义关联学习的基于图像视觉数据跨域检索OA北大核心CSTPCD

Image-Based Cross-Domain Visual-Data Retrieval with Deep Semantic Correlation Learning

评论