重庆理工大学学报2024,Vol.38Issue(19):13-20,8.DOI:10.3969/j.issn.1674-8425(z).2024.10.002
面向3D目标检测的多模态生成式图像数据增强的研究
A multimodal generative image data enhancement for 3D object detection
摘要
Abstract
The traditional generative image data augmentation algorithms usually lose 3D attribute information,rendering them unsuitable for 3D object detection in autonomous driving.To address the problem,we propose a multimodal image enhancement algorithm based on stable diffusion model.A data augmentation method specifically designed for 3D object detection is developed employing our proposed algorithm.It further constrains the image generation process by introducing more modal inputs.In addition,it has devised a multimodal feature online generation module to extract real-time information such as scene descriptions,semantic distributions,and depth features.Meanwhile,for the multimodal feature fusion network,an enhanced gating self-attention module is designed to accurately capture depth information in the latent feature space.This effectively preserves the 3D attribute information of the image,facilitating targeted modifications to 2D features like texture,color,and illumination.Leveraging the algorithm's exceptional depth-preserving characteristics,the new images are combined with 3D pseudo-labels to create novel image samples,thereby achieving data augmentation for image samples.The 3D detection results on the nuScenes public dataset demonstrate the effectiveness of our algorithm in preserving 3D attributes,particularly for larger categories such as buses and trucks.The AP values exhibit noticeable improvement of 17.2%and 14.1%respectively.Additionally,the indicator of mAP and DNS is increased by 6.8%and 3.4%respectively.关键词
数据增强/稳定扩撒/图像生成/目标检测/特征融合Key words
data enhancement/stable diffusion/image generation/object detection/feature fusion分类
信息技术与安全科学引用本文复制引用
张光钱,周广利,黄飞,刘文兵,向阳开..面向3D目标检测的多模态生成式图像数据增强的研究[J].重庆理工大学学报,2024,38(19):13-20,8.基金项目
重庆市科技创新重大研发项目(CSTB2022TIAD-STX0003) (CSTB2022TIAD-STX0003)