InstanceGaussian

Framework

Abstract

3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations.

However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation.

In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies.

Results

RGB

OpenGaussian

InstanceGaussian

Instance segmentation result — Visualization comparison of category-agnostic 3D instance segmentation result.

Open vocabulary results — Open-vocabulary query point cloud understanding on Scannet dataset.

BibTeX


@misc{li2024instancegaussianappearancesemanticjointgaussian,
      title={InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception}, 
      author={Haijie Li and Yanmin Wu and Jiarui Meng and Qiankun Gao and Zhiyao Zhang and Ronggang Wang and Jian Zhang},
      year={2024},
      eprint={2411.19235},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19235}, 
}