SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes

Weixiao Gao, Liangliang Nan, Hugo Ledoux

Delft University of Technology
CVPR 2025

Overview

SUM Parts provides part-level semantic segmentation of urban textured meshes, covering 2.5km² with 21 classes. From left to right: textured mesh, face-based and texture-based annotations. Classes include unclassified , terrain terrain , high vegetation , water water , car car , boat boat , wall wall , roof surface , facade surface , chimney chimney , dormer dormer , balcony balcony , roof installation , window window , door door , low vegetation , impervious surface , road road , road marking , cycle lane , and sidewalk .

Online 3D Viewer

Textured mesh

Wireframes

Face-labeled mesh

Pixel-labeled mesh

Video Presentation

Interactive Annotation

Our annotation aims to achieve precise semantic labeling with significantly improved efficiency for urban meshes. Our tool features two main modules for part-level semantic annotation: face-based annotation for triangle faces and texture-based annotation for texture pixels. We enhance the efficiency of both modules by incorporating interactive selection and template-matching strategies. We invited five individuals with experience in remote sensing to manually annotate the dataset using our tool: two focused on face-based annotation, two on texture pixel-based annotation, and one reviewed and corrected the annotations. The entire annotation process took approximately 640 hours in total.

Top left: Protrusion selection and matching; Top middle: Protrusion selection; Top right: Planar segment matching; Bottom left: User-defined template matching; Bottom middle: Region-based template matching; Bottom right: Local region expansion.

Benchmark Datasets

We defined two label types: face (12 labels, excluding `unclassified`) and pixel (19 labels, excluding `terrain` and `unclassified`). For face labels, we evaluated four mesh point cloud sampling strategies; random/Poisson-disk sampling matched superpixel texture sample size, while face-centered sampling matched mesh face count. For pixel labels, we tested three sampling methods: random, Poisson-disk, and superpixel texture sampling. We also evaluated state-of-the-art 3D semantic segmentation methods, including mesh-based (RF-MRF, SUM-RF, PSSNet) and point cloud-based (PointNet, PointNet++, SPG, SparseConvUnet, RandLA-Net, KPConv, PointNext, PointTransV3, PointVector) approaches.

Video Demo

Top left: Textured mesh; Top right: Wireframes; Bottom left: Face-labeled mesh; Bottom right: Pixel-labeled mesh.

Face-labeling comparision

Pixel-labeling comparision

BibTeX


            @InProceedings{Gao_2025_CVPR,
            author    = {Gao, Weixiao and Nan, Liangliang and Ledoux, Hugo},
            title     = {SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes},
            booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
            month     = {June},
            year      = {2025},
            pages     = {24474-24484}
            }
        
            
            @article{sum2021,
            author = {Weixiao Gao and Liangliang Nan and Bas Boom and Hugo Ledoux},
            title = {SUM: A Benchmark Dataset of Semantic Urban Meshes},
            journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
            volume = {179},
            pages = {108-120},
            year={2021},
            issn = {0924-2716},
            doi = {10.1016/j.isprsjprs.2021.07.008},
            url = {https://www.sciencedirect.com/science/article/pii/S0924271621001854},
            }