Implemented Visual Features

We currently focus on image labelling tasks such as image segmentation or classification applications.
We implement two types of image features as described in the
documentation of visual features in more detail.

Building

git clone --recursive https://github.com/deeplearningais/curfil.git # --recursive will also init the submodulescd curfil
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release .. # change to 'Debug' to build the debugging version
ccmake .# adjust paths to your system (cuda, thrust, ...)!
make -j
ctest # run tests to see if it went well
sudo make install

Refer to your local Unix expert if you do not know what to do with this instruction.

Dataset Format

Training and prediction requires to load a set of images from a dataset. We
currently only support datasets that contain RGB-D images, as for example
captured by the Microsoft Kinect or the Asus Xtion PRO LIVE. RGB-D images have
three channels that encode the color information and one channel for the depth
of each pixel. Depth is the distance of the object to the camera. Note that
stereo cameras such as the Kinect do not guarantee to deliver a valid depth
measure for every pixel in the image. Distance cannot be measured if the object
is occluded for one of the two cameras. Missing or invalid distance is either
encoded with a zero value or by using the special floating point value NaN.

We expect to find the color image, depth information and the ground truth in three files in the same folder.
All images must have the same size. Datasets with varying image sizes must be padded manually.
You can specify to skip the padding color when sampling the dataset by using the --ignoreColor parameter.

The filename schema and format is

<name>_colors.png
A three-channel uint8 RGB image where pixels take on values between 0-255

<name>_depth.png
A single-channel uint16 depth image. Each pixel gives
the depth in millimeters, with 0 denoting missing depth. The depth image can be
read using MATLAB with the standard function (imread), and in OpenCV by loading
it into an image of type IPL_DEPTH_16U.

<name>_ground_truth.png
A three-channel uint8 RGB image where pixels take on values between 0-255.
Each color represents a different class label. Black indicates "void" or
"background".

Usage

Training

Use the binary curfil_train.

The training process produces a random forest consisting of multiple decision trees
that are serialized to compressed JSON files, one file per tree.

Prediction

Use the binary curfil_predict.

The program reads the trees from the compresses JSON files and performs a dense
pixel-wise classification of the specified input images.
Prediction is accelerated on GPU and runs in real-time speed even on mobile
GPUs such as the NVIDIA GeForce GTX 675M.

Hyperopt Parameter Search

Use the binary curfil_hyperopt.

This Hyperopt client is only built if MDBQ is installed.
The client fetches hyperopt trials (jobs) from a MongoDB database and performs 5-fold cross-validation to evaluate the loss.
You can run the hyperopt client in parallel on as many machines as desired.

The trials need to be inserted into the database in advance.
We include sample python scripts in scripts/.
Note that there is only one new trial in the database at any given point in time.
Thus, the python script needs to be running during the entire parameter search.

The procedure:

Make sure the MongoDB database is up an running.

Run the python script that inserts new trials. Example: scripts/NYU/hyperopt_search.py