Hacking Maps

Editor's note: Schuyler Erle, one of Mapping Hacks' coauthors, will be participating in a panel discussion on sustainable businesses for data at O'Reilly's Where 2.0 Conference. If you're a developer of location-based services and apps, don't miss what is sure to be a lively debate among executives of organizations like Navteq, Microsoft, the Census Bureau, and others, as they discuss business models for service and data companies.

Geocode a U.S. Street Address

You know your
friend's address, but that won't
help you program your GPS or aim your ICBM. For that, you need her
latitude and longitude; you want to
"geocode" her address! Geocoding is
the process of adding geographic coordinates, such as
latitude/longitude, to other information. You can geocode street
addresses, or any other information that has a geographic component.

One Saturday we were sitting around thinking that we really ought to
go see the Power Tool Drag Races. We knew that they were put on by
Qbox (http://www.qbox.org/), and
we even knew their address, but where exactly is
that? Sure, we could use a commercial mapping service and have it
tell us to turn left here, and in circles there, but what I wanted
was to program my GPS and have it just sort of point the way. At one
level, this is much harder to follow than turn-by-turn directions,
except that directions only work as long as you follow them. Since I
have little confidence in my ability to follow directions in San
Francisco, I am very happy to have the safety net of the GPS pointer.

To cut to the chase, just enter this URL (Figure 7-1 shows what it should return):

We plugged (37.734085, -122.377589) into our GPS unit, and off we
went for a day of power-tool debauchery.

There are commercial services that
provide geocoding for U.S. addresses and for other parts of the
world. To find them, just do a Google search for "Geocode
Addresses."

A geocoder is also at the heart of all the online map services. When
you enter a street address into MapQuest, it is geocoded and the map
you get is generated from the returned coordinates. In the good old
days of the Web, pretty much all of the online map services returned
the lat/long for addresses as a
"freebie." Then they decided
that geocoding had added value, and one by one they pulled the plug.

There is a strong movement of people who believe in open data and
open data formats. Mapping sites' removal of free
geocoding led directly to the creation of the free
geocoder.us site. As William Gibson famously
noted, "The street finds its own uses for
things," and that use can transcend and exceed the
original vision of the tool.

The Birth of geocoder.us

Strangely enough, the removal of useful features from online map services
seemed to occur right before a surge of interest in free sources of
geodata among the free and open source software community.

Collecting this data and keeping it up to date with
"ground truth squads" who go around
and verify that streets are where they are supposed to be and that
houses haven't up and run off, is quite expensive.

An alternative to the full expense of this data lies in the U.S.
Census Bureau. They have compiled TIGER (Topologically Integrated
Geographic Encoding and Referencing system) data. TIGER data is used
as part of the normal fulfillment of their duties to do an actual
enumeration of the people every 10 years. This data is imperfect, but
the regular tasks of census workers are similar to our own needs.
They wish to identify the location of a residence based on a street
address, just as we do when we geocode.

Again, it is important to stress that TIGER data is imperfect, however
"imperfect but free" has its own
charm. TIGER data is also used as the basis for the free TIGER Map
Server offered by the Census Bureau at http://tiger.census.gov/cgi-bin/mapsurfer.

There is a lot of interesting information about geography and the
challenges of capturing complex and inconsistent information to be
found in the TIGER documentation. But for simple geocoding, all you
really need to know is that the TIGER data endeavors to include
information on every street segment in the U.S. For each block, the
TIGER data includes the street name, the latitude and longitude at
each end of the block, and the range of address numbers for the left
and the right side of the street.

This street segment goes from (38.390313, -122.816102) to (38.389814,
-122.81515686); one side of the street includes addresses from 1001
through 1019, and the other covers addresses from 1000 to 1018. We
can interpolate that "1005" is
about a fifth of the way from 1001 to 1019 and, assuming the street
is straight, that it will be about a fifth of the way between the
ends of the blocks.

There is a lot of other information in this line, and in the other
files that make up the data set for a county. TIGER/Line comprises
some 24 gigabytes of data for the whole country, including
information on curves in the road that are not the ends of street
segments. But in the interests of compressing that 24 GB into
something searchable, we will simplify that extra information.

Fortunately for us, Schuyler Erle has stripped away all of that
complexity at http://geocoder.us/, a free geocoding web
site and web service for U.S. addresses based on the U.S. Census
TIGER/Line data.

You may use the web site to geocode individual addresses or use one
of three web service interfaces to geocode via code, as illustrated
in [Hack #80] . You can even
download the source code from CPAN (the Perl code repository) at
http://cpan.org, and the
TIGER/Line data from the census to create your own geocoding
service.

The site provides a text box for entry of an address or an
intersection. So entering "1005 Gravenstein Highway
North, Sebastopol, CA" will return the location of
O'Reilly Media. You can also enter an intersection,
like "Hollywood and Vine, Hollywood,
CA" or "Florence Ave and Wilton,
Sebastopol, CA 95472."

If your address is one of the majority of those that
geocoder.us successfully geocodes, it will
return the latitude and longitude. As a bonus, it will display a map,
created dynamically by the TIGER/Line Map Server, with your address
marked and centered.

The results with lat/long appear quickly, but it can take longer for
the map to be fetched from the TIGER/Line Map Server. The map will be
blank and the little circle on the right will be red until the map is
loaded.

In Seattle, Washington, you can indirectly use the geocoder at
Caffeinated and Unstrung to find the nearest location that offers coffee and free wireless
access, as illustrated in Figure 7-2.