智能系统学报2025,Vol.20Issue(6):1304-1327,24.DOI:10.11992/tis.202505001
弱监督下语言引导的图像分割与定位综述
Review of weakly supervised language-guided image segmentation and grounding
摘要
Abstract
Language-guided image segmentation(referring image segmentation,RIS)and grounding(referring expres-sion grounding,REG)aim to predict masks or bounding boxes for target objects based on natural language instructions,serving as key tasks in vision-language understanding.Fully supervised methods are constrained by high annotation costs,driving increasing interest in weakly supervised learning.This paper reviewed recent advances in weakly super-vised RIS and REG from a unified perspective,focused on methods based on image-text pairs and unlabeled data,and discussed current challenges and future directions.It introduced the background of RIS and REG and analyzed the value and challenges of weak supervision.It summarized different types of weak supervision signals,categorized representat-ive methods,and analyzed their characteristics.It presented mainstream datasets and evaluation metrics,and compared the performance of typical methods.Studies showed that incorporating pretrained models,such as large language mod-els,can significantly improve performance.However,limitations due to the constraints of pretrained models and task ad-aptation remain.In the future,optimizing fine-grained cross-modal alignment,model efficiency,and generalization abil-ity will be important research directions.关键词
深度学习/计算机视觉/弱监督学习/无监督学习/指代图像分割/指代表达定位/多模态/大语言模型Key words
deep learning/computer vision/weakly supervised learning/unsupervised learning/referring image seg-mentation/referring expression grounding/multimodal/large language model分类
信息技术与安全科学引用本文复制引用
ZHANG Lei,HUANG Yongqiu,LI Xin,WANG Baoyan..弱监督下语言引导的图像分割与定位综述[J].智能系统学报,2025,20(6):1304-1327,24.基金项目
国家自然科学基金项目(62476064) (62476064)
广东省自然科学基金项目(2024A1515010455). (2024A1515010455)