郑州大学学报(工学版)2025,Vol.46Issue(5):35-42,8.DOI:10.13705/j.issn.1671-6833.2025.05.016
融合CLIP和3D高斯的多模态场景编辑算法
Multimodal Scene Editing Algorithm Integrating CLIP and 3D Gaussian
摘要
Abstract
To address the issues of excessive reliance on annotated data and high computational complexity in 3D scene editing algorithms,in this study a multimodal scene editing method named CLIP2Gaussian was proposed,which integrated CLIP with 3D Gaussian.Firstly,the algorithm employed SAM to extract target masks from multi-view images and introduced a bidirectional propagation strategy to ensure mask consistency across different views.Secondly,the extracted masks were assigned semantic labels using CLIP and mapped to 3D Gaussian points to ena-ble semantic embedding in the 3D scene.Finally,a differentiable rendering mechanism was used to optimize the parameters of the 3D Gaussians,and a spatial consistency regularization strategy was introduced by applying cluste-ring to enhance the consistency and stability of semantic labels in 3D space.Experimental results showed that CLIP2Gaussian achieved 61.23%IoU on the LERF dataset and a per-query response time of 0.57 seconds in se-mantic segmentation tasks,improving the speed by 54 times compared to LERF while achieving superior accuracy and efficiency.Ablation studies further verified that the proposed method enabled precisely editing of target regions with minimal disturbance to the original scene.关键词
3D重建/零样本学习/场景理解/场景编辑/3D高斯Key words
3D reconstruction/zero-shot learning/scene understanding/scene editing/3D Gaussian分类
信息技术与安全科学引用本文复制引用
曹仰杰,王伟平,李振强,谢俊,吕润峰..融合CLIP和3D高斯的多模态场景编辑算法[J].郑州大学学报(工学版),2025,46(5):35-42,8.基金项目
国家自然科学基金资助项目(62302458) (62302458)
郑州市协同创新重大专项(20XTZX06013) (20XTZX06013)