| 注册
首页|期刊导航|集成技术|基于上下文信息和大语言模型的开放词汇室内三维目标检测

基于上下文信息和大语言模型的开放词汇室内三维目标检测

张胜 程俊

集成技术2025,Vol.14Issue(3):51-63,13.
集成技术2025,Vol.14Issue(3):51-63,13.DOI:10.12146/j.issn.2095-3135.20241201003

基于上下文信息和大语言模型的开放词汇室内三维目标检测

Contextual Information and Large Language Model for Open-Vocabulary Indoor 3D Object Detection

张胜 1程俊2

作者信息

  • 1. 中国科学院深圳先进技术研究院 中国科学院人机智能协同系统重点实验室 深圳 518055
  • 2. 中国科学院深圳先进技术研究院 中国科学院人机智能协同系统重点实验室 深圳 518055||香港中文大学 香港 999077
  • 折叠

摘要

Abstract

Existing indoor three-dimensional(3D)object detection is able to detect a limited number of object categories,thus limiting the application on intelligent robotics.Open-vocabulary object detection is able to detect all objects of interest in a given scene without defining object categories,thus solving the shortcomings of indoor 3D object detection.At the same time,the large language model with prior knowledge can significantly improve the performance of visual tasks.However,existing researches on open-vocabulary indoor 3D object detection only focuses on object information and ignores contextual information.The input data for indoor 3D object detection is mainly point cloud,which suffers from sparsity and noise problems.Relying only on the object point cloud can negatively affect the 3D detection results.Contextual information contains scene information,which can complement the object information to promote the recognition on object category.For this reason,this paper proposes an open-vocabulary 3D object detection algorithm based on contextual information assistance.The algorithm integrates contextual information and object information through a large language model,and then performs chain-of-thought reasoning.The proposed algorithm is validated on SUN RGB-D and ScanNetV2 datasets,and the experimental results show the effectiveness of the proposed algorithm.

关键词

大语言模型/室内三维目标检测/开放词汇/上下文信息/思维链

Key words

large language model/indoor 3D object detection/open-vocabulary/contextual information/chain of thoughts

分类

信息技术与安全科学

引用本文复制引用

张胜,程俊..基于上下文信息和大语言模型的开放词汇室内三维目标检测[J].集成技术,2025,14(3):51-63,13.

基金项目

国家自然科学基金项目(U21A20487) (U21A20487)

深圳市科技计划项目(JCYJ20220818101206014) This work is supported by National Natural Science Foundation of China(U21A20487),Shenzhen Technology Project(JCYJ20220818101206014) (JCYJ20220818101206014)

集成技术

2095-3135

访问量0
|
下载量0
段落导航相关论文