首页|期刊导航|智能系统学报|弱监督下语言引导的图像分割与定位综述

弱监督下语言引导的图像分割与定位综述

ZHANG Lei HUANG Yongqiu LI Xin WANG Baoyan

智能系统学报2025，Vol.20Issue(6)：1304-1327,24.

智能系统学报2025，Vol.20Issue(6)：1304-1327,24.DOI:10.11992/tis.202505001

弱监督下语言引导的图像分割与定位综述

Review of weakly supervised language-guided image segmentation and grounding

ZHANG Lei ¹HUANG Yongqiu ²LI Xin ²WANG Baoyan²

作者信息

1. School of Electronic Information Engineering,Guangdong University of Petrochemical Technology,Maoming 525000,China
2. School of Computer Science,Guangdong University of Petrochemical Technology,Maoming 525000,China
折叠

摘要

Abstract

Language-guided image segmentation(referring image segmentation,RIS)and grounding(referring expres-sion grounding,REG)aim to predict masks or bounding boxes for target objects based on natural language instructions,serving as key tasks in vision-language understanding.Fully supervised methods are constrained by high annotation costs,driving increasing interest in weakly supervised learning.This paper reviewed recent advances in weakly super-vised RIS and REG from a unified perspective,focused on methods based on image-text pairs and unlabeled data,and discussed current challenges and future directions.It introduced the background of RIS and REG and analyzed the value and challenges of weak supervision.It summarized different types of weak supervision signals,categorized representat-ive methods,and analyzed their characteristics.It presented mainstream datasets and evaluation metrics,and compared the performance of typical methods.Studies showed that incorporating pretrained models,such as large language mod-els,can significantly improve performance.However,limitations due to the constraints of pretrained models and task ad-aptation remain.In the future,optimizing fine-grained cross-modal alignment,model efficiency,and generalization abil-ity will be important research directions.

关键词

深度学习/计算机视觉/弱监督学习/无监督学习/指代图像分割/指代表达定位/多模态/大语言模型

Key words

deep learning/computer vision/weakly supervised learning/unsupervised learning/referring image seg-mentation/referring expression grounding/multimodal/large language model

分类

信息技术与安全科学

引用本文复制引用

ZHANG Lei,HUANG Yongqiu,LI Xin,WANG Baoyan..弱监督下语言引导的图像分割与定位综述[J].智能系统学报,2025,20(6):1304-1327,24.

基金项目

国家自然科学基金项目(62476064) （62476064）

广东省自然科学基金项目(2024A1515010455). （2024A1515010455）

智能系统学报

OA北大核心

ISSN：1673-4785

访问量1

下载量0

段落导航