A Weakly-Supervised Crowd Density Estimation Method Based on Two-Stage Linear Feature CalibrationOA北大核心CSTPCDEI
A Weakly-Supervised Crowd Density Estimation Method Based on Two-Stage Linear Feature Calibration
In a crowd density estimation dataset,the annota-tion of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision informa-tion,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised informa-tion.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature cali-bration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local fea-tures hidden in the crowd images.In addition,we use the pyra-mid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local fea-ture loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accu-racy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fully-supervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shang-haiTech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
Yong-Chao Li;Rui-Sheng Jia;Ying-Xiang Hu;Hong-Mei Sun
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590||Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266000, ChinaCollege of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao266590, ChinaCollege of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590||College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210000, China
Crowd density estimationlinear feature calibrationvision transformerweakly-supervision learning
《自动化学报(英文版)》 2024 (004)
965-981 / 17
This work was supported by the Humanities and Social Science Fund of the Ministry of Education of China(21YJAZH077).
评论