首页|期刊导航|电子学报|基于灾难性遗忘及组合叠加擦除的跨模态行人重识别预训练方法

基于灾难性遗忘及组合叠加擦除的跨模态行人重识别预训练方法OACSCDCSTPCD

Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure

中文摘要英文摘要

面向构建24小时全时段视频监控系统的需要,基于可见光与近红外的跨模态行人重识别受到工业界与学术界的广泛关注.然而,目前大部分跨模态行人重识别任务都试图利用在ImageNet上预训练的模型来提前学习模态内共性特征,但ImageNet与跨模态行人数据模态差异较大,且预训练过程中将颜色信息作为判别特征之一,导致预训练中学习到的共性特征并不适用于无色彩红外图像的信息表示.本文提出了一种基于灾难性遗忘及组合叠加擦除的自监督跨模态行人重识别预训练方法,首先利用提出的灾难性遗忘评分来对预训练数据进行筛选,旨在减小预训练数据与后续任务数据存在的域间差距,进一步减少模型训练时间.其次,针对传统跨模态识别中的关键区分性特征提取,本文设计了一种强通道数据增强策略,通过对R、G、B三通道的通道级擦除与组合,生成了颜色迥异的多类型样本,有利于促使模型关注于纹理信息而非颜色信息.最后基于本文提出的跨模态数据筛选指标以及通道增强策略,构建了跨模态任务的自监督学习框架.实验结果表明,本文提出的预训练方法所训练的ResNet50网络在迁移到众多跨模态行人重识别方法时优于目前主流自监督预训练方法,其中在经典方法AGW的基础上Rank1与mAP分别提高了8.02%与 5.81%.

To meet the need of building a 24-hour full-time video surveillance system,cross-modal pedestrian recog-nition based on visible light and near-infrared is widely concerned by industry and academia.However,most of the current cross-modal pedestrian recognition tasks attempt to use pre-trained models on ImageNet to learn modal commonalities in ad-vance,but there are large modal differences between ImageNet and cross-modal pedestrian data,in the pre-training process,the color information is taken as one of the distinguishing features,which leads to the common features learned in the pre-training is not suitable for the information representation of the colorless infrared image.This paper proposes a self-super-vised cross-modal pedestrian recognition pre-training method based on catastrophic forgetting and combined superposition erasure.Firstly,the pre-training data are filtered by using the proposed catastrophic forgetting score,the aim is to reduce the domain gap between the pre-training data and the follow-up task data,and further reduce the training time of the model.Secondly,aiming at the key distinguishing feature extraction in traditional cross-modal identification,this paper designs a strong channel data enhancement strategy by erasing and combining the R,G and B channels at the channel level,multi-type samples with different colors are generated,which makes the model focus on texture information instead of color infor-mation.Finally,a self-supervised learning framework for cross-modal tasks is constructed based on the cross-modal data filtering index and channel enhancement strategy.The experimental results show that the ResNet50 network trained by the proposed pre-training method is superior to the current self-supervised pre-training methods when migrating to a large num-ber of cross-modal pedestrian recognition methods,on the basis of AGW,Rank1 and mAP were increased by 8.02%and 5.81%respectively.

孙锐;谢瑞瑞;张磊;张旭东;高隽

合肥工业大学计算机与信息学院,安徽合肥 230601||工业安全与应急技术安徽省重点实验室,安徽合肥 230009合肥工业大学计算机与信息学院,安徽合肥 230601||工业安全与应急技术安徽省重点实验室,安徽合肥 230009合肥工业大学计算机与信息学院,安徽合肥 230601||工业安全与应急技术安徽省重点实验室,安徽合肥 230009合肥工业大学计算机与信息学院,安徽合肥 230601||工业安全与应急技术安徽省重点实验室,安徽合肥 230009合肥工业大学计算机与信息学院,安徽合肥 230601||工业安全与应急技术安徽省重点实验室,安徽合肥 230009

计算机与自动化

自监督行人重识别跨模态预训练灾难性遗忘组合叠加擦除

self-supervisedpedestrian re-identificationcross-modalitypre-trainingcatastrophic forgettingstrong channel combination

《电子学报》 2023 (10)

2925-2935,11

国家自然科学基金面上项目(No.61876057)安徽省自然科学基金(No.2208085MF158)安徽省重点研发计划-科技强警专项项目(No.202004d07020012)National Natural Science Foundation of China(No.61876057)National Natural Science Foun-dation of Anhui(No.2208085MF158)Key Research and Development Plan of Anhui Province(No.202004d07020012)

10.12263/DZXB.20221190

评论