The Scene Understanding and Modeling Challenge

The SUMO challenge encourages the development of algorithms for complete understanding of 3D indoor scenes from 360° RGB-D panoramas with the goal of enabling social AR and VR research and experiences.
The target 3D models of indoor scenes include all visible layout elements and objects complete with pose, semantic information, and texture. Algorithms submitted are evaluated at 3 levels of complexity corresponding to 3 tracks of the challenge: oriented 3D bounding boxes, oriented 3D voxel grids, and oriented 3D meshes.

360° RGB-D Input

360° RGB

360° Depth

Complete 3D Scene Output

3D Texture + Pose

3D Semantic + Instance

Dataset

The SUMO challenge dataset is derived from processing scenes from the SUNCG dataset to produce 360° RGB-D images represented as cubemaps
and corresponding 3D mesh models of all visible scene elements. The mesh models are further processed into a bounding box and voxel-based representation. The dataset format is described in detail here.

59 K

Indoor Scenes

360°

View

2

Modalities

1024

Resolution

1024 X 1024 RGB images

1024 X 1024 Depth Maps

2D Semantic Information

3D Semantic Information

3D Object Pose

3D Element Texture

3D Bounding Boxes Scene Representation

3D Voxel Grid Scene Representation

3D Mesh Scene Representation

Participate

The SUMO Challenge is organized into three performance tracks based on the output representation of the scene. A scene is represented as a collection of elements, each of which models one object in the scene (e.g., a wall, the floor, or a chair). An element is represented in one of three increasingly descriptive representations: bounding box, voxel grid, or surface mesh. For each element in the scene, a submission contains the following outputs listed per track.

3D Bounding Box Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

3D Voxel Grid Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Location and RGB Color of Occupied 3D Voxels

3D Mesh Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Element's textured mesh (in .glb format)

Metrics

Evaluation of a 3D scene focuses on 4 keys aspects: Geometry, Appearance, Semantic and Perceptual - (GASP)
Details of the metrics for each track are provided here.

Contest test set - 360 scenes - No provided ground truth - The data set on which the contest performance will be evaluated. This should be submitted for the final contest submission.

Run your algorithm on the selected test set to generate a project scene for each test scene. Each project scene should be in a separate directory with the scene_id as its directory name. See the SUMO white paper for details. Compress the directory containing the output project scenes into a zip file and upload it to a publicly visible web location.

Log in if you already have an account, or sign up for a new account if you don't.

Click on "All Challenges" in the left menu, and then click on "2018 SUMO Challenge" in the list of ongoing challenges.

Click the "Participate" menu item.

If you do not already have a participation team, create one in the "Create a New Team" dialog box on the right.

Once you have a participation team, it will show up in the list on the left. Select your team from the list by clicking on the circle. Then click "Participate".

Select the evaluation phase that corresponds to the performance track and data set you chose above. In the "Upload File" box, upload the json file you created above. Enter any of the other optional information you would like to include. Press "Submit".

Be patient. Evaluation can take a minute per scene due to the complexity of 3D metrics.

Once the evaluation is complete, the results can be seen on the leaderboard page of the SUMO Challenge on EvalAI. Note that you must select the appropriate challenge phase to see the results.