Open BI is dedicated to Dataprocessing, Datawarehousing and Open Source Business Intelligence.

Wednesday, 21 April 2010

GeoCoding with Kettle : new plugin

Hi all,
I created a plugin for geocoding addresses into Kettle v3.5. This plugin is using the google maps API. You can learn more about this API HERE.

What is Geocoding ?

According to wikipedia, geocoding is “the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes”. Normalization is the process to clean an input address and putit into a normalized, standardized format.Reverse geocoding is the opposite : finding a complete address from GPS coordinates.

Raised relief map … a basic tool for geocoding.

The plugin

For the moment, it is a basic V1 release, but fully working. A lot more features are about to be added (advanced geocoding).
Here is the plugin screen, in Kettle. This is a basic screen as you can see. You need to enter the following :

GMapKey : your google map key. The geocoding works without it … well for me. But I recommend you to sign on for the API and use your Google key.

Input Address Field : the address field, from the incoming rows, on which you want to process the geocoding

Normalized address : give the column name in which the normalized address will be stored.

City Field : give the column name in which the city name will be stored.

GPS Coord Fields : give the column name in which the GPS Coordinates address will be stored.

Here is the main Kettle screen with a transformation sample.

Let’s see how it works

For the example above, I used 4 row creation steps to create 4 types of addresses (French, USA, Asia, Africa). Here is the output : a code, a raw adress (with typos and disorder) and a comment.
Let’s imagine now we want to normalize the Raw address content field and retrieve the corresponding GPS coordinates for each address. Let’s do it, we set up the plugin screen with the following informations : your GMap key, the “Raw address” input field and the names for the normalized address, the city field and the GPS coords.
Now we can plug everything and start the transformation. The plugin is asking for geocoding to the Google map API for each address. You will find the result set as follows :
The original fields are still here (Code, Raw Adress and Comment), but the plugin added 3 more fields according the names you set up previously (Norm_address, City and GPS_Coord). As you can see, the adress is normalized and formated, thanks to Google map API. The GPS coords are : lat / lng.

Limits

After some readings, I noticed you can ask for geocoding up to 15.000 time per day. This is a limitation of the Google map API. I didn’t try to go above 15.000 addresses / geocoding demands. I let you check this (create 15001 lines in the row creation steps …).

I want it

No problem. You can download the plugin HERE (plugin, xml file and icon) and test it. Like usual, everything is packed into a single jar using fatjar.

What’s next ?

This is a basic geocoding process. I’m currently working on something more powerfull, with more features : using all the API attributes, give ability to the user to choose which attributes he wants / doesn’t want, reverse geocoding … etc …
Please, if this plugin is usefull for you, tell me more about your needs. I will be happy to upgrade this plugin for your usage.

great stuff...but I think this is not compatible with the terms of use of Google's API.

Look at parts 9 and 10 of http://code.google.com/intl/fr/apis/maps/terms.html - this is under the "terms of use" link on the API page you linked to. Basically, these terms of use say you can only use the API to display maps on a publicly available free-of-charge website.

I think the geonames service (http://www.geonames.org/) provides geodata under a less restrictive license, but you should probably check that out in detail - I am not a lawyer.

Hi Roland,Thanks for your message.You are right with the terms of use. I sent an email to google to ask if I'm definitely out of the bounds (still no reply). If so, I will switch to another geocoder.I'm currently in a big webmarketing company using 100% NET framework, and I noticed a lot of people is using the google api that way (bouhhhhhh).

Thanks Sylvain !I sent my message to google groups (the one dedicated to google maps devs) and I'm still waiting for an answer. Maybe I'm the first guy to use this API in conjonction with an ETL tool...Wait n see ...

Who am I ?

Datawarehousing & BI / Cloud Computing and Data processing.
I work for several clients from banking/insurance to call centers, entertainment and tourism.
Regularly CTO or CDO for startups or marketing companies dealing with data. Technical consulting for startups around Big Data, Analytics and Cloud Computing. Currently working for public / gov organization.