Abstract

Geospatial data can be enormous in size and tedious to process efficiently on standard computational workstations. Distributing the processing tasks through highly parallelized processing reduces the burden on the primary processor and processing times can drastically shorten as a result. ERSI's ArcGIS, while widely used in the military, does not natively support multi-core processing or utilization of graphic processor units (GPUs). However, the ArcPy Python library included in ArcGIS 10 provides geospatial developers with the means to process geospatial data in a flexible environment that can be linked with GPU application programming interfaces (APIs). This research extends a custom desktop geospatial model of spatial similarity for remote soil classification which takes advantage of both standard ArcPy/ArcGIS geoprocessing functions and custom GPU kernels, operating on an NVIDIA Tesla S2050 equipped with potential access to 1792 cores. The author will present their results which describe hardware and software configurations, processing efficiency gains, and lessons learned.

Article Preview

Introduction

Spatial similarity is a statistical analysis method that can be used to compare the similarity of two locations through their proximity to ancillary features. This method has been successfully used in the past to assist in threat predictions and crime analysis (Riese, 2006; Liu & Brown, 2003; Brown, Dalton, and Hoyle, 2004). Calculating the spatial similarity is broken into two parts: data characterization and model computation. The data characterization procedure first measures pairwise distances between a source dataset and K number of features in M number of feature layers. Similar measurements between the same feature layers and grid cells in a raster covering the extent of the AOI are then calculated. The model computation step then uses a modified kernel density estimate (KDE) to iteratively calculate the spatial similarity between each grid cell and the source dataset.

Efforts to perform spatial similarity calculations over large areas at high resolution are hampered by memory limitations and computing efficiency, especially in model computation. As a first step in showing the benefit of utilizing GPU-based processing for geospatial analysis, we begin with optimizing model computation. It is expected that other standard geoprocessing functions’ processing times could be greatly reduced by distributing the computational burden across multiple processors and through GPU-enabled parallel processing.

Model initialization, I/O, and data characterization were performed in Python using the ArcPy interface to ArcGIS geoprocessing functions in conjunction with standard python libraries, such as NumPy. Model computation required communication with the NVIDIA GPU for which NVIDIA CUDA (NVIDIA 2010a) was used. CUDA is a subset of C designed specifically for NVIDIA hardware. A Python wrapper to CUDA, PyCuda (Klockner, 2012), was used to simplify development and remain in the Python environment. Though PyCuda provides a high-level interface to CUDA, the kernel process that runs on each GPU core must still be written in CUDA C and adhere to specific limitations. Thus, the modified KDE was deconstructed and coded into a series of custom CUDA kernels optimized for the Tesla S2050 hardware. Optimization and efficiency results will be presented along with details regarding the development procedure and specific hardware (see diagram at Figure 1).

Figure 1.

Depiction of the hardware and software configuration used in this research

Model Of Spatial Similarity

Spatial similarity is the degree to which two locations are alike with respect to other features or environmental variables, the calculation of which takes place in feature space. Feature space defines a particular location not by its spatial coordinates, but by its spatial location relative to a series of static features or elements in the same AOI as that incident. To define a vector point in feature space requires a measurement of proximity between each point feature and features in each feature class of interest. Any distance metric may be used, but Euclidian distance measurements are the most common and simplest to calculate.