| 注册
首页|期刊导航|计算机工程与应用|指代视频分割方法研究综述

指代视频分割方法研究综述

魏彩颖 贾磊

计算机工程与应用2025,Vol.61Issue(2):73-83,11.
计算机工程与应用2025,Vol.61Issue(2):73-83,11.DOI:10.3778/j.issn.1002-8331.2405-0343

指代视频分割方法研究综述

Methods for Referring Video Object Segmentation

魏彩颖 1贾磊1

作者信息

  • 1. 硅湖职业技术学院 计算机科学与技术学院,江苏 苏州 215332
  • 折叠

摘要

Abstract

Referring video object segmentation(RVOS)is a hot research topic in the cross-media task spanning video and language.It aims to segment correlated entities in a given video with textual descriptions.Unlike conventional visual segmentation task that depends on pre-defined classes,the RVOS task is to understand the given expressions to locate and segment the referring entities without the help of pre-defined classes.Due to the randomness of the textual expressions and no pixel-wise masks serving as a reference,the RVOS task is more challenging than the conventional video segmenta-tion task.Although RVOS is a new task in cross-modal understanding,it has essential application prospects for many tasks(e.g.,security monitoring,vehicle tracking,person re-identification,and so on),thus increasing number of signifi-cant methods are being proposed consecutively.Specifically,the solutions are roughly divided into four categories according to the differences in research approaches,such as dynamic convolution based,attention based,multi-level information learning based and end-to-end sequence prediction based methods.Later,qualitative and quantitative performance com-parisons are presented for analysis.Lastly,the paper summarizes several issues existing in current methods,and then some suggestions are proposed to further improve the performance of RVOS tasks in future work.

关键词

跨模态检索/指代视频分割/跨模态理解

Key words

cross-modal search/referring video object segmentation/cross-modal understanding

分类

信息技术与安全科学

引用本文复制引用

魏彩颖,贾磊..指代视频分割方法研究综述[J].计算机工程与应用,2025,61(2):73-83,11.

计算机工程与应用

OA北大核心

1002-8331

访问量0
|
下载量0
段落导航相关论文