首页|期刊导航|计算机工程|基于视觉-语言预训练模型的开集交通目标检测算法

基于视觉-语言预训练模型的开集交通目标检测算法

黄琦强安国成熊刚

计算机工程2025，Vol.51Issue(6)：375-384,10.

计算机工程2025，Vol.51Issue(6)：375-384,10.DOI:10.19678/j.issn.1000-3428.0069168

基于视觉-语言预训练模型的开集交通目标检测算法

Open-Set Traffic Object Detection Algorithm Based on Vision-Language Pre-training Model

黄琦强 ¹安国成 ²熊刚¹

作者信息

1. 上海交通大学电子信息与电气工程学院,上海 200240
2. 上海华讯网络系统有限公司行业数智事业部,四川成都 610074
折叠

摘要

Abstract

Traffic object detection is a crucial component of intelligent transportation systems.However,existing traffic object detection algorithms can only detect predefined objects and are incapable of handling open-set object scenarios.To address this,a novel open-set traffic object detection algorithm based on a Visual-Language Pre-trained(VLP)model is proposed.First,by leveraging Faster R-CNN as a foundation,the prediction network is modified to adapt to the localization challenges of open-set objects.The loss function is refined to the Intersection over Union(IoU)loss,effectively enhancing the localization accuracy.Second,a new VLP-based Label Matching Network(VLP-LMN)is constructed to perform label matching on the predicted bounding boxes.The VLP model serves as a potent knowledge repository that effectively matches regional images with labelled text.Simultaneously,prompt engineering and fine-tuning of network modules facilitate better exploration of the VLP model's performance,significantly improving the accuracy of label matching.The algorithm achieves an average detection accuracy of 60.3％for new classes on the PASCAL VOC07+12 dataset,demonstrating its commendable performance in open-set object detection.Additionally,the average detection accuracy for new classes on a traffic dataset reaches 58.9％,with only a 14.5％decrease compared with the base classes in zero-shot detection.This underscores the strong generalization capabilities of the algorithm in traffic object detection.

关键词

视觉-语言预训练模型/Faster R-CNN/开集目标检测/交通目标检测

Key words

Visual-Language Pre-trained(VLP)model/Faster R-CNN/open-set object detection/traffic object detection

分类

信息技术与安全科学

引用本文复制引用

黄琦强,安国成,熊刚..基于视觉-语言预训练模型的开集交通目标检测算法[J].计算机工程,2025,51(6):375-384,10.

基金项目

"十四五"国家重点研发计划(2023YFC3006700) （2023YFC3006700）

国家自然科学基金(62071293). （62071293）

计算机工程

OA北大核心

ISSN：1000-3428

访问量0

下载量0

段落导航