|国家科技期刊平台
首页|期刊导航|光学精密工程|基于跨层次聚合网络的实时城市街景语义分割

基于跨层次聚合网络的实时城市街景语义分割OA北大核心CSTPCD

Real-time urban street view semantic segmentation based on cross-layer aggregation network

中文摘要英文摘要

随着自动驾驶技术的迅速发展,精确高效的场景理解显得尤为重要.城市街景语义分割旨在准确识别并分割出行人、障碍物、道路和标志物等要素,为自动驾驶技术提供必要的道路信息.然而,当前的语义分割算法在城市街景分割中仍然面临一些挑战,主要表现为不同类别的像素区分不够清晰、对于复杂场景结构的理解不够精准以及对小尺度对象或大尺度结构的分割不准确等问题.为此,本文提出一种基于跨层次聚合网络的实时城市街景语义分割算法.首先,在编码器末端设计了结合跨层次聚合的金字塔池化模块,用于高效提取多尺度上下文信息;其次,在编码器和解码器之间设计了跨层次聚合模块,通过引入通道注意力机制增强信息的表征能力,逐级聚合编码器阶段的特征以充分实现特征复用;最后,在解码器阶段设计了多尺度融合模块,在通道维度聚合全局信息与局部信息,促进深层特征与浅层特征的融合.将所提算法在两个通用的城市街景数据集上进行了验证.在一张RTX3090显卡上(TensorRT测速环境),本文算法在Cityscapes测试集以294 FPS的实时性达到73.0%mIoU的准确性,在更高分辨率的图像上以164 FPS的实时性达到75.8%mIoU的准确性;在CamVid 数据集以239 FPS的实时性达到74.8%mIoU的准确性.实验结果表明,本文算法在准确性与实时性之间取得了有效平衡,对比其他算法的语义分割性能具有显著提升,为实时城市街景语义分割领域带来了新的突破.

With the rapid development of autonomous driving technology,precise and efficient scene un-derstanding has become increasingly important.Urban street scene semantic segmentation aims to accu-rately identify and segment elements such as pedestrians,obstacles,roads,and signs,providing necessary road information for autonomous driving technology.However,current semantic segmentation algorithms still face challenges in urban street scene segmentation,mainly manifested in issues such as insufficient dis-crimination between different categories of pixels,inaccurate understanding of complex scene structures,and inaccurate segmentation of small-scale objects or large-scale structures.To address these issues,this paper proposed a real-time urban street scene semantic segmentation algorithm based on a cross-layer ag-gregation network.Firstly,a pyramid pooling module combined with cross-layer aggregation was de-signed at the end of the encoder to efficiently extract multi-scale context information.Secondly,a cross-layer aggregation module was designed between the encoder and decoder,which enhances the representa-tion ability of information by introducing a channel attention mechanism and gradually aggregates the fea-tures of the encoder stage to fully achieve feature reuse.Finally,a multi-scale fusion module was designed in the decoder stage,which aggregates global and local information in the channel dimension to promote the fusion of deep and shallow features.The proposed algorithm was validated on two common urban street scene datasets.On an RTX 3090 graphics card(TensorRT speed measurement environment),the algorithm achieves 73.0%mIoU accuracy on the Cityscapes test set with real-time performance of 294 FPS,and 75.8%mIoU accuracy on higher resolution images with real-time performance of 164 FPS;on the CamVid dataset,it achieves 74.8%mIoU accuracy with real-time performance of 239 FPS.Experi-mental results show that the proposed algorithm effectively balances accuracy and real-time performance,significantly improving semantic segmentation performance compared to other algorithms,and bringing new breakthroughs to the field of real-time urban street scene semantic segmentation.

侯志强;程敏婕;马素刚;屈敏杰;杨小宝

西安邮电大学 计算机学院,陕西 西安 710121||西安邮电大学 陕西省网络数据分析与智能处理重点实验室,陕西 西安 710121

计算机与自动化

语义分割卷积神经网络城市街景编码器-解码器结构金字塔池化模块

semantic segmentationconvolutional neural networkurban street viewencoder-decoder structurepyramid pooling module

《光学精密工程》 2024 (008)

1212-1226 / 15

国家自然科学基金资助项目(No.62072370);陕西省自然科学基金项目(No.2023-JC-YB-598)

10.37188/OPE.20243208.1212

评论