首页|期刊导航|郑州大学学报（工学版）|融合CLIP和3D高斯的多模态场景编辑算法

融合CLIP和3D高斯的多模态场景编辑算法

曹仰杰王伟平李振强谢俊吕润峰

郑州大学学报（工学版）2025，Vol.46Issue(5)：35-42,8.

郑州大学学报（工学版）2025，Vol.46Issue(5)：35-42,8.DOI:10.13705/j.issn.1671-6833.2025.05.016

融合CLIP和3D高斯的多模态场景编辑算法

Multimodal Scene Editing Algorithm Integrating CLIP and 3D Gaussian

曹仰杰 ¹王伟平 ¹李振强 ¹谢俊 ¹吕润峰¹

作者信息

1. 郑州大学网络空间安全学院,河南郑州 450002
折叠

摘要

Abstract

To address the issues of excessive reliance on annotated data and high computational complexity in 3D scene editing algorithms,in this study a multimodal scene editing method named CLIP2Gaussian was proposed,which integrated CLIP with 3D Gaussian.Firstly,the algorithm employed SAM to extract target masks from multi-view images and introduced a bidirectional propagation strategy to ensure mask consistency across different views.Secondly,the extracted masks were assigned semantic labels using CLIP and mapped to 3D Gaussian points to ena-ble semantic embedding in the 3D scene.Finally,a differentiable rendering mechanism was used to optimize the parameters of the 3D Gaussians,and a spatial consistency regularization strategy was introduced by applying cluste-ring to enhance the consistency and stability of semantic labels in 3D space.Experimental results showed that CLIP2Gaussian achieved 61.23%IoU on the LERF dataset and a per-query response time of 0.57 seconds in se-mantic segmentation tasks,improving the speed by 54 times compared to LERF while achieving superior accuracy and efficiency.Ablation studies further verified that the proposed method enabled precisely editing of target regions with minimal disturbance to the original scene.

关键词

3D重建/零样本学习/场景理解/场景编辑/3D高斯

Key words

3D reconstruction/zero-shot learning/scene understanding/scene editing/3D Gaussian

分类

信息技术与安全科学

引用本文复制引用

曹仰杰,王伟平,李振强,谢俊,吕润峰..融合CLIP和3D高斯的多模态场景编辑算法[J].郑州大学学报（工学版）,2025,46(5):35-42,8.

基金项目

国家自然科学基金资助项目(62302458) （62302458）

郑州市协同创新重大专项(20XTZX06013) （20XTZX06013）

郑州大学学报（工学版）

OA北大核心

ISSN：1671-6833

访问量1

下载量0

段落导航