Random forests are ensemble classifiers that are popular in the computer vision community since Lepetit et al. (2005) published their work on keypoint recognition. Variants of random forests were used to improve object detection or image segmentation, e.g. (Lepetit and Fua, 2006; Shotton et al., 2011; Stückler et al., 2012; Rodrigues et al., 2012). One of the most prominent examples is the work of Shotton et al. (2011) who use random forests in Microsoft’s Kinect system for estimation of human pose from single depth images.

Real-time applications as presented by Lepetit et al. (2005) and Shotton et al. (2011) require fast prediction in few milliseconds per image. This is possible with parallel architectures such as GPUs since every pixel can be processed independently. Random forest training, however, can be a time consuming process if the available dataset is large. Sharp (2008) implements random forest training and prediction for the Kinect system that achieves a prediction speed-up of 100 times and training speed-up of eight on a GPU, compared to a CPU. This implementation is not publicly available and uses Direct3D which is only supported on the Microsoft Windows platform. Shotton et al. (2011) use a distributed CPU implementation to reduce the training time, which is nevertheless one day for training three trees from one million images on a 1000 CPU core cluster. Their implementation is also not publicly available. Training of random forests even from medium-sized datasets is feasible only when investing in computation time on large clusters. Furthermore, changing the visual features and other hyper-parameters requires a re-training of the random forest, which impedes efficient scientific research.

Our software release delivers a fast implementation of random forests that is specialized on image labeling applications. We develop and test our implementation with tasks on RGB datasets with and without depth measurements.

Primary goal of the CURFIL project is to accelerate random forest training and prediction. Training time is dominated by the evaluation of the best split criterion, which we accelerate with the massive parallel computing power offered by GPUs.