电子学报2025,Vol.53Issue(3):686-704,19.DOI:10.12263/DZXB.20240602
无人机视角多源目标检测数据集UAV-RGBT及算法基准
UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark
摘要
Abstract
Unmanned aerial vehicle(UAV)-based multispectral object detection utilizing both visible(RGB)and ther-mal infrared(T)images,makes all-weather and all-day target monitoring possible,serving critical roles in military and civil-ian applications.However,due to the complexity of data acquisition and processing,there is currently a lack of publicly available UAV-based RGB-T multispectral object detection datasets,which to some extent limits its research and applica-tion.Meanwhile,UAV operational scenarios are characterized by complex and variable conditions,including rapid changes in flight altitude,speed,focal length,and background.So,the captured targets exhibit diverse scales,uneven(dense/sparse)distributions,and category imbalances in images,which presents significant challenges for accurate detection.Furthermore,real-time requirement should be guaranted in applications such as reconnaissance and traffic monitoring.Therefore,it is the key to keep a trade-off between accuracy and speed in the algorithmic design of UAV RGB-T object detector.To address these issues,this paper introduces a large-scale UAV-based RGB-T multispectral dataset named UAV-RGBT,which spans across seasons and day-night cycles,and includes multiple categories and scales.Specifically,UAV-RGBT comprises 20 categories with 5 117 pairs of RGB-T images and over 110 000 annotations,which is conducive to advancing research in UAV-based multispectral object detection algorithms.Moreover,based on the YOLOv8n model,the UAV-based dual-branch multispectral object detection(UAV-DMDet)model is proposed to promote deep fusion of multispectral features through a multi-modal cross-attention fusion module and a multi-modal feature decomposition combination module.This approach achieves a batter trade-off among model parameter size,detection speed,and accuracy.Experimental results dem-onstrate that the UAV-DMDet model improves the mAP@0.5 on the UAV-RGBT dataset by 3.61%and 11.03%in the visi-ble and thermal modalities,respectively,and enhances the mAP@0.5:0.95 by 0.84%and 6.76%,respectively.On the Drone-Vehicle dataset,the UAV-DMDet model outperforms the mainstream algorithm I2MDet,with mAP@0.5 and mAP@0.5:0.95 improvements of 2.66%and 12.36%,respectively.Furthermore,with 640' 640 resolution images as input,the UAV-DMDet model achieve FP32 precision inference speed of 31 frames per second on a GeForce RTX 3090 GPU,and FP16 precision inference speed of 58 frames per second on a Huawei Ascend 710 processor,making it effectively applicable for real-time UAV-based RGB-T multispectral object detection tasks.关键词
无人机(UAV)/可见光-热红外(RGB-T)多源目标检测/数据集/多源特征融合/YOLOv8Key words
unmanned aerial vehicle(UAV)/visible and thermal infrared multispectral object detection/dataset/multi-modal feature fusion/YOLOv8分类
信息技术与安全科学引用本文复制引用
汪进中,戴顺,张秀伟,田雪涛,邢颖慧,汪芳,尹翰林,张艳宁..无人机视角多源目标检测数据集UAV-RGBT及算法基准[J].电子学报,2025,53(3):686-704,19.基金项目
国家自然科学基金(No.61971356) (No.61971356)
陕西省自然科学基础研究计划(No.2024JC-DXWT-07,No.2024JC-YBQN-0719) (No.2024JC-DXWT-07,No.2024JC-YBQN-0719)
陕西省重点研发计划(No.2023-YBGY-012) (No.2023-YBGY-012)
广东省基础与应用基础研究基金(No.2024A1515030186) National Natural Science Foundation of China(No.61971356) (No.2024A1515030186)
Natural Science Basic Research Program of Shaanxi Province(No.2024JC-DXWT-07,No.2024JC-YBQN-0719) (No.2024JC-DXWT-07,No.2024JC-YBQN-0719)
Key Research and Development Program of Shaanxi Province(No.2023-YBGY-012) (No.2023-YBGY-012)
Basic and Applied Basic Research Foundation of Guangdong Province(No.2024A1515030186) (No.2024A1515030186)