工程科学与技术2025,Vol.57Issue(6):104-118,15.DOI:10.12454/j.jsuese.202401025
基于全局特征聚焦与信息增强的遮挡行人检测算法
Global Feature Focusing and Information Enhancement Network for Occluded Pedestrian Detection
摘要
Abstract
Objective Pedestrian detection is a crucial task in computer vision,particularly in applications such as autonomous driving,robot navigation,and intelligent surveillance.However,pedestrian occlusion in real-world scenarios remains a significant challenge.Occlusion causes a sharp re-duction in the visible range of targets and a substantial loss of pedestrian features,making it difficult for detectors to effectively distinguish be-tween targets and pedestrians.Existing methods,including post-processing optimization,specific model-based improvements,and body-part feature-based approaches,have limitations such as inaccurate handling of heavily occluded positive samples,high computational complexity,and susceptibility to background noise.Therefore,developing a more effective method to address pedestrian occlusion detection is essential to en-hance the performance of pedestrian detectors. Methods The proposed global feature focusing and information enhancement network(GFFIE‒Net)employed HRNet‒W32 as the backbone net-work to generate multi-scale feature maps with different resolutions(1/4,1/8,1/16,and 1/32 of the input image).These feature maps captured both high-level semantic information and low-level spatial details,which were essential for detecting pedestrians in complex scenes.The convolu-tional block attention module(CBAM)was embedded after the feature maps to enhance the feature representation and reduce background noise interference.CBAM adjusted the importance of each channel and spatial location in the feature maps through operations such as global average pooling,maxpooling,and small fully connected neural networks in both channel and spatial attention dimensions.This process strengthened the feature information in key areas and suppressed background noise,enabling the network to focus on the target area.Then,considering the limita-tions of CNN-based methods in global information extraction,the Mamba module was cascaded after the CBAM.The Mamba module first flat-tened the feature maps into one-dimensional image patch vectors and then used linear layers for feature extraction and transformation.It captured global contextual information and long-range dependencies between feature vectors through forward and backward processing using the state space model(SSM).This process assisted in extracting contextual information around occluded pedestrians and inferring complete pedestrian fea-tures based on visible ones.Finally,a hierarchical feature fusion mechanism was designed.This mechanism first utilized the bilinear interpolation algorithm to adjust the spatial resolution of different-scale feature maps to be consistent.Then,it concatenated the three high-dimensional and low-resolution feature maps rich in semantic information along the channel dimension to enhance the deep semantic representation.After that,it combined the preliminarily fused feature map with the low-dimensional and high-resolution feature map containing more detailed location infor-mation along the channel dimension.This achieved a comprehensive fusion of high-level semantic and positional detail information,enabling the algorithm to capture multi-level semantic features.The final feature map was processed by a detection head,which generated center heatmaps,scale heatmaps,and offset maps to predict pedestrian bounding boxes. Results and Discussions Ablation experiments were designed from four aspects to comprehensively verify the effectiveness of the proposed GFFIE-Net improvements.First,the effects of different global information extraction methods on the experimental results were investigated.Sec-ond,the effects of various modules on the network performance were analyzed.Third,the impact of different scales on network performance,se-quential cascade structure,and the rationalization of hierarchical feature fusion were explored.Fourth,the robustness of the designed enhance-ment modules was verified by testing them on different backbone networks.Extensive experiments were conducted on three challenging pedes-trian datasets:CityPersons,Caltech,and CrowdHuman.The experimental results showed that the R metric reached 43.7%on the heavily occluded subset of the CityPersons dataset,representing an improvement of 4.4 percentage points compared to the baseline method;33.6%on the heavily occluded subset of the Caltech dataset;and 43.2%on the CrowdHuman dataset,outperforming several mainstream methods.Finally,a visualiza-tion analysis of the detection boxes and center heatmaps was conducted.Seven representative practical scene images were selected from the three datasets,including traffic,intersection video surveillance,nighttime,high-density traffic,strong light,small target,and crowded pedestrian scenes.The results showed that compared to the baseline network,GFFIE‒Net produced more significant central responses and more accurate de-tection box positioning for occluded pedestrians.In the high-density traffic scene,for example,when multiple pedestrians were occluded by one another,the baseline network failed to detect many pedestrians,and the central heatmap exhibited weak responses to occluded individuals.In con-trast,GFFIE‒Net accurately identified and located occluded pedestrians.This indicated that GFFIE‒Net effectively handled occluded pedestrians in various scenarios,demonstrating strong adaptability and high detection performance. Conclusions The proposed GFFIE‒Net,integrating the CBAM module,Mamba module,and hierarchical feature fusion mechanisms,effectively addresses the challenges of feature loss and background noise in occluded scenarios.The experimental results from three benchmark datasets demonstrate the superiority of GFFIE‒Net compared to existing methods,particularly in managing heavily occluded pedestrians.Future research can explore semi-supervised or self-supervised learning using limited labeled data.This approach can reduce dependence on large-scale labeled datasets,enhance model generalization,and improve the method's applicability and accuracy across diverse scenarios.关键词
行人检测/Mamba/特征增强/CBAMKey words
pedestrian detection/Mamba/feature enhancement/CBAM分类
计算机与自动化引用本文复制引用
郑开魁,吉康友,李俊,李琦铭..基于全局特征聚焦与信息增强的遮挡行人检测算法[J].工程科学与技术,2025,57(6):104-118,15.基金项目
国家自然科学基金项目(52275178 ()
62102394) ()
福建省科技计划项目(2022L3094 ()
2023N3010) ()
泉州市科技计划项目(2024QZC001R) (2024QZC001R)