ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation

3DV 2025

1Nanjing University of Information Science and Technology, China 2Lancaster University, UK
teaser image.
teaser image.
teaser image.

Some examples of our method on zero-shot unlabeled segmentation. The first row is the input and the other row is the output.

Abstract

Zero-shot 3D part segmentation is a challenging and fundamental task. In this work, we propose a novel pipeline, ZeroPS, which achieves high-quality knowledge transfer from 2D pretrained foundation models (FMs), SAM and GLIP, to 3D object point clouds. We aim to explore the natural relationship between multi-view correspondence and the FMs’ prompt mechanism and build bridges on it. In ZeroPS, the relationship manifests as follows: 1) lifting 2D to 3D by leveraging co-viewed regions and SAM’s prompt mechanism, 2) relating 1D classes to 3D parts by leveraging 2D-3D view projection and GLIP’s prompt mechanism, and 3) enhancing prediction performance by leveraging multiview observations. Extensive evaluations on the PartNetE and AKBSeg benchmarks demonstrate that ZeroPS significantly outperforms the SOTA method across zero-shot unlabeled and instance segmentation tasks. ZeroPS does not require additional training or fine-tuning for the FMs. ZeroPS applies to both simulated and real-world data. It is hardly affected by domain shift.

Overall Pipeline

Overview of the proposed pipeline ZeroPS. First, in the unlabeled segmentation phase, the input 3D object is segmented into unlabeled parts. The self-extension component can extend 2D segmentation from a single viewpoint to 3D segmentation (3D groups), by using a predefined extension sequence starting from that viewpoint. For example, the red cue on the left side of the figure illustrates this process. Second, in the instance segmentation phase, given a text prompt, the multi-modal labeling component assigns an instance label to each 3D unlabeled part.

Self-extension

Multi-modal Labeling

Qualitative Comparison

Left: PartNetE’s simulated data. Right: AKBSeg’s real-world data.

Quantitative Comparison

BibTeX

 @inproceeding{xue2025zerops,
      title={ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation},
      author={Xue, Yuheng and Chen, Nenglun and Liu, Jun and Sun, Wenyun},
      booktitle={3DV},
      year={2025}
    }