计算机工程2026,Vol.52Issue(4):62-81,20.DOI:10.19678/j.issn.1000-3428.0069312
基于Transformer的DETR目标检测算法综述
Review of DETR Object Detection Algorithm Based on Transformer
摘要
Abstract
Convolutional Neural Networks(CNNs)are widely used in the field of object detection,earning widespread acclaim in scholarly circles due to their precision and scalability.It has spawned numerous notable models,including those in the Region-based Convolutional Neural Networks(R-CNNs)(such as Fast R-CNN and Faster R-CNN)and You Only Look Once(YOLO)series.After the success of Transformers in the field of natural language processing,researchers began exploring their application in computer vision,leading to the development of visual backbone networks such as Visual Transformer(ViT)and Swin Transformer.In 2020,a Facebook research team unveiled DEtection TRansformer(DETR),an end-to-end object detection algorithm based on Transformers,designed to minimize the need for prior knowledge and postprocessing in object detection tasks.Despite the promise shown by DETR in object detection,it has limitations including low convergence speed,relatively low accuracy,and the ambiguous physical significance of target queries.These issues have spurred a wave of research aimed at refining and enhancing the algorithm.This paper aims to collate,scrutinize,and synthesize the various efforts aimed at improving DETR,assessing their respective merits and demerits.Furthermore,it presents a comprehensive overview of state-of-the-art research and specialized application domains that employ DETR and concludes with a prospective analysis of the future role of DETR in the field of computer vision.关键词
计算机视觉/目标检测/DETR算法/视觉Transformer/图像分割Key words
computer vision/object detection/DETR algorithm/Visual Transformer(ViT)/image segmentation分类
信息技术与安全科学引用本文复制引用
李沂杨,陆声链,王继杰,陈明..基于Transformer的DETR目标检测算法综述[J].计算机工程,2026,52(4):62-81,20.基金项目
国家自然科学基金(61662006) (61662006)
广西多源信息挖掘与安全重点实验室主任基金(20-A-02-02). (20-A-02-02)