Abstract

Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their “spatial intelligence”? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance in SPARE3D is almost equal to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors.

The code is copyrighted by the authors. Permission to copy and use
this software for noncommercial use is hereby granted provided: (a)
this notice is retained in all copies, (2) the publication describing
the method (indicated below) is clearly cited, and (3) the
distribution from which the code was obtained is clearly cited. For
all other uses, please contact the authors.
The software code is provided "as is" with ABSOLUTELY NO WARRANTY
expressed or implied. Use at your own risk.
This code provides an implementation of the method described in the
following publication:
Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, and Chen Feng,
"SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings,"
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, 2020.

Task viewpoint settings

Three View to Isometric task

Isometric to Pose task

Pose to Isometric task

Results

SPARE3D benchmark results of Three View to Isometric, Isometric to Pose, and Pose to Isometric tasks

Top row is for SPARE3D-ABC, bottom row is for SPARE3D-CSG.

Isometric View Generation task testing samples

Point Cloud Generation task testing samples

Acknowledgment

Wenyu Han and Siyuan Xiang contributed equally to the coding, data preprocessing/generation, paper writing, and experiments in this project. Chenhui Liu contributed to the crowd-sourcing website and human performance data collection. Ruoyu Wang contributed to the experiments and paper writing. Chen Feng proposed the idea, initiated the project, and contributed to the coding and paper writing.

The research is supported by NSF CPS program under CMMI-1932187. Siyuan Xiang gratefully thanks the IDC Foundation for its scholarship. The authors gratefully thank our human test participants and the helpful comments from Zhaorong Wang, Zhiding Yu, Srikumar Ramalingam, and the anonymous reviewers.