This page contains the SYNTH3 dataset used for the experimental evaluation of the paper "Deep Learning for Confidence Information in Stereo and ToF Data Fusion" [1].
The dataset contains synthetic data relative to 55 3D scenes simulating the acquisition from a trinocular setup made by a Time-of-Flight sensor and a stereo setup.

Menu

Paper

The paper proposes a novel framework for the fusion of depth
data produced by a Time-of-Flight (ToF)
camera and a stereo vision system. The key problem of balancing
between the two sources of information
is solved by extracting confidence maps for both sources using
deep learning. We introduce a novel synthetic dataset
accurately representing the data acquired by the proposed setup
and use it to train a Convolutional Neural Network architecture.
The machine learning framework estimates the reliability of both
data sources at each pixel location. The two depth fields are
finally fused enforcing the local consistency of depth data taking
into account the confidence information [2]. Experimental
results show that the proposed approach increases the accuracy of
the depth estimation.

Dataset

We provide a new synthetic dataset called SYNTH3 specifically developed for machine learning based Stereo-ToF
fusion applications. The dataset is split in two parts, a training set and a test set.
The training set contains 40 scenes obtained by rendering 20 unique scenes from different viewpoints while the test
set is composed by 15 unique scenes.
The various scenes contain furnitures and objects of various shapes in different environments e.g., living rooms,
kitchen rooms or offices. Furthermore, some outdoor locations with non-regular structure are also included in the
dataset.They appear realistic and suitable for the simulation of Stereo-ToF acquisition systems

We have virtually placed in each scene a stereo system with characteristics resembling the ones of the ZED stereo
camera and a ToF camera with characteristics similar to a Microsoft Kinect v2. To this purpose, we used a simulator
realized by Sony EuTEC starting from the work of Meister et al. [3].

For each scene sample in the dataset, the following data are provided: (NEW:
the dataset has been updated and extended and now contains also reprojected data
and the ToF acquisition at different frequencies)

The 512x424 ToF depth map.

The 960x540 ToF depth map projected on the reference camera of the stereo system.