| 注册
首页|期刊导航|北京航空航天大学学报|多模态特征交互与语义引导融合的RGB-T人群计数

多模态特征交互与语义引导融合的RGB-T人群计数

陈永 张娇娇 董珂

北京航空航天大学学报2026,Vol.52Issue(1):28-37,10.
北京航空航天大学学报2026,Vol.52Issue(1):28-37,10.DOI:10.13700/j.bh.1001-5965.2023.0735

多模态特征交互与语义引导融合的RGB-T人群计数

Multimodal feature interaction and semantic guided fusion for RGB-T population counting

陈永 1张娇娇 2董珂2

作者信息

  • 1. 兰州交通大学 电子与信息工程学院,兰州 730070||甘肃省人工智能与图形图像处理工程研究中心,兰州 730070
  • 2. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 折叠

摘要

Abstract

RGB-T mode crowd counting is designed to take advantage of the complementarity of visible RGB and thermal infrared image to achieve crowd counting.Aiming at the problems of insufficient information interaction between modes and insufficient feature fusion in the feature extraction of the RGB-T multimodal crowd counting method,an RGB-T crowd counting method based on multi-modal feature interaction and semantic guided fusion is proposed.Firstly,a stacked small scale convolution kernel is designed as a branch of the backbone network to extract the coarse features of each single mode.Secondly,in order to address the limited information interaction between the modes,a multi-modal feature interaction module is suggested.This module will extract the features of each RGB and thermal infrared mode and actualize the interactive features of the mode information.Then,a semantic-guided fusion module is designed to enhance the semantic relevance of multi-modal crowd features through global and local feature-guided fusion,so as to fully integrate multi-context information and improve the recognition ability of the target population.Finally,the regression head is used to generate the population density map and output the counting results.Experimental results demonstrate that the proposed method outperforms the comparison algorithms on the open RGBT-CC dataset,with a 31.12%reduction in the root-mean-square error value compared to the CMCRL method and higher accuracy for crowd counting under various scenarios.

关键词

深度学习/RGB-T/人群计数/多模态特征交互/语义引导融合

Key words

deep learning/RGB-T/population count/multimodal feature interaction/semantic guided fusion

分类

信息技术与安全科学

引用本文复制引用

陈永,张娇娇,董珂..多模态特征交互与语义引导融合的RGB-T人群计数[J].北京航空航天大学学报,2026,52(1):28-37,10.

基金项目

国家自然科学基金(62462043,61963023) (62462043,61963023)

兰州交通大学基础研究拔尖人才项目(2023JC36) (2023JC36)

兰州交通大学重点研发项目(ZDYF2304) National Natural Science Foundation of China(62462043,61963023) (ZDYF2304)

Lanzhou Jiaotong University Basic Top-Notch Personnel Project(2023JC36) (2023JC36)

Key Research and Development Project of Lanzhou Jiaotong University(ZDYF2304) (ZDYF2304)

北京航空航天大学学报

1001-5965

访问量0
|
下载量0
段落导航相关论文