| 注册
首页|期刊导航|计算机工程|基于视觉-语言预训练模型的开集交通目标检测算法

基于视觉-语言预训练模型的开集交通目标检测算法

黄琦强 安国成 熊刚

计算机工程2025,Vol.51Issue(6):375-384,10.
计算机工程2025,Vol.51Issue(6):375-384,10.DOI:10.19678/j.issn.1000-3428.0069168

基于视觉-语言预训练模型的开集交通目标检测算法

Open-Set Traffic Object Detection Algorithm Based on Vision-Language Pre-training Model

黄琦强 1安国成 2熊刚1

作者信息

  • 1. 上海交通大学电子信息与电气工程学院,上海 200240
  • 2. 上海华讯网络系统有限公司行业数智事业部,四川成都 610074
  • 折叠

摘要

Abstract

Traffic object detection is a crucial component of intelligent transportation systems.However,existing traffic object detection algorithms can only detect predefined objects and are incapable of handling open-set object scenarios.To address this,a novel open-set traffic object detection algorithm based on a Visual-Language Pre-trained(VLP)model is proposed.First,by leveraging Faster R-CNN as a foundation,the prediction network is modified to adapt to the localization challenges of open-set objects.The loss function is refined to the Intersection over Union(IoU)loss,effectively enhancing the localization accuracy.Second,a new VLP-based Label Matching Network(VLP-LMN)is constructed to perform label matching on the predicted bounding boxes.The VLP model serves as a potent knowledge repository that effectively matches regional images with labelled text.Simultaneously,prompt engineering and fine-tuning of network modules facilitate better exploration of the VLP model's performance,significantly improving the accuracy of label matching.The algorithm achieves an average detection accuracy of 60.3%for new classes on the PASCAL VOC07+12 dataset,demonstrating its commendable performance in open-set object detection.Additionally,the average detection accuracy for new classes on a traffic dataset reaches 58.9%,with only a 14.5%decrease compared with the base classes in zero-shot detection.This underscores the strong generalization capabilities of the algorithm in traffic object detection.

关键词

视觉-语言预训练模型/Faster R-CNN/开集目标检测/交通目标检测

Key words

Visual-Language Pre-trained(VLP)model/Faster R-CNN/open-set object detection/traffic object detection

分类

信息技术与安全科学

引用本文复制引用

黄琦强,安国成,熊刚..基于视觉-语言预训练模型的开集交通目标检测算法[J].计算机工程,2025,51(6):375-384,10.

基金项目

"十四五"国家重点研发计划(2023YFC3006700) (2023YFC3006700)

国家自然科学基金(62071293). (62071293)

计算机工程

OA北大核心

1000-3428

访问量0
|
下载量0
段落导航相关论文