Tips and Tricks using GIS BigData, ArcGIS APIs and other fun stuff :-)

Monday, September 24, 2012

Processing Big Data with Apache Hive and Esri ArcPy

Data Scientists, if you are processing and analyzing spatial data and are using Python, then ArcPy should be included in your arsenal of tools and ArcMap should be utilized for geo spatial data visualization. Following the last post where I extended Apache Hive with spatial User Defined Functions (UDFs), in this post I will demonstrate the usage of the "extended" Hive within Python and how to save the output into a feature class for rendering within ArcMap or any ArcWeb client using ArcGIS Server.

Given a running Hadoop instance and assuming that you have installed Hive and have created a Hive table as described in the last post, start the Hive Thrift server as follows:

$ hive --service hiveserver

When ArcGIS for Desktop is installed on a host, Python is optionally installed and is enabled with GeoProcessing capabilities. Install Hive on your desktop and set the environment variable HIVE_HOME to the location where Hive is residing. To access the Hive python libraries, export the environment variable PYTHONPATH with its value set to $HIVE_HOME/lib/py.

With the setup behind us, let's tackle a simple use case; Given a polygon feature class on the desktop and a set of points stored in the Hadoop File System and are exposed through a Hive table, I want to perform a point in polygon operation on Hadoop and update the local feature class polygon attributes with the return results.

The script imports the Thrift Hive client and the ArcPy library. It then connects to the Thrift Hive server on the localhost and executes a set of setup operations. The first two add the countries shapefile geometry and spatial index files into the distributed cache. The next setup adds the jar file containing the spatial UDF functions. The last setup defines the pip function with a reference to the class in the loaded jar. The select statement is executed to retrieve the country identifier and the number of cities in that country based on a nest select who uses the pip function to identify which city point falls into which country polygon. An fid with a value of -1 is returned if a pip result is not found and is excluded from the final group count. The fetchAll function returns a list of text items, where each text item is an fid value followed by a tab then a count value. A dictionary is populated by tokenizing the list where the dictionary key is the fid and the value is the count. An arcpy update cursor is opened on the local countries feature class and a row iterator is executed. for each row, the FID value is retrieved and checked if it exists as a dictionary key. If found, the attribute HADOOP field is updated with the dictionary value.

Upon a successful execution (and remember, this might take a while as Hive is a batch process), open ArcMap, load that feature class and symbolize it with a class break qualifier based on the HADOOP field values.

Pretty cool, no? This is a very very simple example of the marriage of a BigData tool and a GIS tool using Python. There is so much more that can be done using this combination of tools in the same thought process. Expect more posts along the same vein with more arcpy usage. I just wanted to plant a small seed in your mind.

Update: This is another example that calculates the average lat/lon values of cities per country in Hive and the result set in used to create a point feature class:

1 comment:

This big data buzzz is really picking up....now it truly harmonizes the blending of opensource and ESRI, there is one article i would like to share with you, http://www.aftenposten.no/digital/nyheter/Samler-data-om-hele-verden-6999454.html, although in norwegian.

Twitter Updates

About Me

BigData Advocate - Senior Software Architect at Esri - In addition to being part of the dev team, I travel the globe assisting customers implement BigData solutions with ArcGIS server - Cloudera Certified Hadoop Developer and HBase Specialist - When not coding, you will find me road biking in the middle of winter - went out the other day and was 8F - stay warm :-)