河南理工大学学报(自然科学版)2024,Vol.43Issue(6):146-155,10.DOI:10.16186/j.cnki.1673-9787.2023030005
用于实时语义分割的丰富语义提取器网络
Rich semantic extractor network for real-time semantic segmentation
摘要
Abstract
Objectives The inference speed of the real-time semantic segmentation network is limited,the depth of the network is shallow,which lead to insufficient semantic feature information extracted.Addition-ally,the shallow network depth restricts the capability of feature extraction networks,reducing their robust-ness and adaptability.In order to solve such the problems,Methods a rich semantic extractor network(RSENet)for real-time semantic segmentation was proposed.Firstly,aiming at the problem of inadequate se-mantic feature information extraction,a rich semantic extractor(RSE)was introduced,which included a multi-scale global semantic extraction module(MGSEM)and a semantic fusion module(SFM).MGSEM was used to extract rich multi-scale global semantics and expand the effective receptive field of the network.At the same time,SFM efficiently fused multi-scale local semantics and multi-scale global semantics,so that the network had more comprehensive and rich semantic information.Finally,according to the characteristics of the detailed branch and the semantic branch,a space reconstruction aggregation module(SRAM)was de-signed to model the context information of the detailed features and enhanced the feature representation,so that the two branches could be efficiently aggregated.Results Comprehensive experiments were conducted on Cityscapes and ADE20K datasets,and the proposed RSENet achieved mIoU of 75.6%and 35.7%at in-ference speed of 76 frames/s and 67 frames/s,respectively.Conclusions The experimental results suggested that in the extraction of semantic information within complex scenes,the network proposed in this paper was able to deeply explore and accurately capture such semantic information in images.Furthermore,outstanding performance was demonstrated in achieving a balance between accuracy and speed,with the network not only capable of achieving high-precision semantic segmentation but also exhibiting very fast inference speeds.This efficient image segmentation capability endowed the network with high practicality and operabil-ity in real-world application scenarios.关键词
语义分割/多尺度特征/视觉Transformer/特征融合Key words
semantic segmentation/multi-scale feature/vision Transformer/feature fusion分类
信息技术与安全科学引用本文复制引用
赵珊,田楷文,孙君顶..用于实时语义分割的丰富语义提取器网络[J].河南理工大学学报(自然科学版),2024,43(6):146-155,10.基金项目
国家自然科学基金资助项目(62276092) (62276092)