The buffer k-d tree library is an OpenCL library that aims at accelerating nearest neighbor computations using both k-d trees and graphics processing cards (GPUs). The source code is published under the GNU General Public License (GPLv3).

Buffer k-d trees aim at scenarios, where you are given both a large reference (e.g., 100,000 points) and a huge query set (e.g., 1,000,000 or more points) with an input space of moderate dimensionality (e.g., from 4 to 20 dimensions). A description of the techniques used and an experimental evaluation of the implementation using massive astronomical data sets are provided in the corresponding paper.

News

21 October 2014: Added a note regarding the performances using different driver/CUDA settings.

12 June 2014: Initial release

Compatibility

The current version of the source code is experimental (use the code at your own risk!). So far, it has only been tested extensively on a system with an Intel(R) Core(TM) i7-3770 CPU at 3.40GHz (4 cores, 8 hardware threads), 16GB RAM, a GeForce GTX 770 GPU (with 4GB RAM) and Ubuntu 12.04 (64 Bit) as operating system.

The implementation is based on the efficient use of implicit hardware caches. Thus, to obtain good speed-ups, the GPU at hand has to support this feature! Current architectures such as Nvidia's Kepler architecture exhibit such caches, see, e.g., the Kepler GK110 Whitepaper.

Downloads

Installation

The code has been tested on Ubuntu 12.04 (64 Bit) only, see above. Given an appropriate Installation of OpenCL, compiling the source code on a Linux-based system should be doable by running the Makefile in the "src" directory. IMPORTANT: You might have to change the GPU_PLATFORM_NUMBER in src/bufferkdtree/Makefile before compiling the sources. Also, other modifications might need to be done (such as specifying another OpenCL path).

NOTE: The current implementation dynamically loads OpenCL kernels, which have to be compiled when invoked the first time (these kernels have to be recompiled whenever the problem parameters change, such as the number of nearest neighbors or the dimensionality of the input patterns).For this reason,the path to the OpenCL kernels is absolute (thus, the kernels subdirectory must not be moved/deleted).

This bash script invokes both the buffer k-d tree implementation as well as a parallel (CPU) k-d tree implementation. Similary, you can run the script astronomy_large, which will automatically download additional data (1GB) prior to running both implementations. An exemplary output obtained on the machine mentioned above can be found here; the data used corresponds to psf_model_mag in the paper mentioned below.

NOTE: The current implementation loads all test queries into the global memory of the GPU. For this reason, executing astronomy_large requires a GPU with a quite large global memory (4GB). In case not enough resources can be allocated, you might get the error "Error with errorcode: -4 in file gpu.c in line 399" One way to reduce this memory overhead is to process the test queries in smaller chunks (I am working on that, the next release will process such chunks automatically).

NOTE: The performance might depend on the particular OpenCL version (and driver). For instance, the results mentioned above were obtained on Ubuntu 12.04 (64 Bit) with kernel 3.8.0-44-generic, CUDA 5.5, and NVIDIA driver 319.23. The performance might be different on the same system using a different setup (e.g., using Ubuntu 14.04 (64 Bit) with kernel 3.13.0-36-generic, CUDA 6.5, and NVIDIA driver 340.29, the performance drops by about 30%).

Citations

If you wish to cite a paper that describes the techniques and the implementation for buffer k-d trees, please make use of the following work:

Disclaimer

The source code is published under the GNU General Public License (GPLv3). The software is free for non-commercial use only; other licenses can be obtained upon request. The author is not responsible for any implications that stem from the use of this software.