Do Less, Get More!

Wouldn’t it be lovely if we could do a little bit less but in doing so, improved things? We’re excited to report that we have been able to make this paradox a reality! We’ve finally added what we are calling structured geocoding to our open source geocoding engine, Pelias, and thereby Mapzen Search.

In the past we’ve asked our users to do additional work when geocoding tabular address data that was split into
constituent parts. Because we only supported a single string query parameter, our users were forced to concatenate
the address parts into a single string in order to geocode that address.

Not only did this make things harder for the user, it in fact made things harder for the geocoding engine as well.
The engine was consequently tasked with accurately breaking up the single query string back into the very parts
the user glued together in the first place. As you can imagine, this process leaves room for misunderstandings.
It was often the case that result accuracy suffered due to this unnecessary input processing.

Enter Structured Geocoding!

Structured geocoding is what we’re calling the ability to provide the geocoding engine
with the address broken up into its constituent parts. We’ve just released this functionality
into production on our hosted instance of Pelias, known as Mapzen Search.
If you’ve been following our release notes,
we’ve been brewing this in beta for a few weeks. At last, structured geocoding is now available
at http://search.mapzen.com/v1/search/structured.

Before you run off and start doing less, we invite to do one last thing and read on about where, when, and how to
use this awesome new feature.

One Parameter Doesn’t Fit All

Until now, Mapzen has only supported geocoding and searching using a single text input that contained all
the search and location data. Sometimes this isn’t the best way to geocode since your application’s needs
may not have information in this format. Consider a CSV
file full of addresses to be geocoded:

address

city

state

country

1600 Pennsylvania Ave

Washington

DC

US

10 Downing Street

London

GB

55 Rue du Faubourg Saint-Honoré

Paris

FR

Bulevardul Geniului 1

Bucharest

Romania

Or, in another use case, say for some reason you recently decided to
move to Canada and your GPS device needs to geocode
your new home address. The ambiguity presented by a single text input isn’t ideal in this situation,
so you’ll most likely be presented with a multi-field prompt in which to enter your new address:

address

city

state

country

9 Queen Elizabeth Way

Fort Erie

ON

CA

Without separate fields the application would have to concatenate the address parts together into a single input to Pelias. While this task may seem fairly pedestrian, this is problematic for several reasons.

First, ambiguity can be introduced with concatenation. For example, 10 Park Place North Charleston South Carolina can be legitimately interpreted by an address analyzer as a city name containing a directional (North Charleston actually is a city in South Carolina):

Second, it’s not always clear how to concatenate the fields of an address. The United States places the zip code after the state (e.g. 801 Leroy Place, Socorro, NM 87801) whereas Germany formats addresses with the postal code between the street address and city (e.g. Otto-Dürr-Straße 1, 70435 Stuttgart, Germany).

Third, the dizzying variety of address formats and edge cases mean that address parsing is tricky business. We use libpostal for text parameter parsing at our /v1/search endpoint and, while it’s a fantastic address parser, a geocoder should not introduce ambiguity where application data contains little to none.

In any case, forcing the user to concatenate multiple fields into one for single input geocoding puts an undue burden upon the application developer.

Parameters

As the name hopefully implies, structured geocoding means the requesting application has geographic data already split up into its constituent parts. Structured geocoding has been deployed to the /v1/search/structured endpoint and accepts one or more of the following parameters:

address

neighbourhood

borough

locality

county

region

postalcode

country

Using these parameters, you can construct requests that geocode full addresses or just a city and country, for example.

Along with the new parameters, structured geocoding supports all the other search parameters that you’ve grown to love, like boundary.country, sources, layers, and size.

address

The address parameter can contain a full address including house number or just a street name. Pelias stores addresses as separate number and street fields (libpostal is utilized to parse the number and street values from the address field).

neighbourhood

Neighbourhoods are vernacular geographic entities that may not necessarily be official administrative divisions but are important nonetheless.

borough

Boroughs are a bit of an oddity in the realm of spatial data. For the most part they fit in between neighbourhoods and localities but are mostly identifiable to the general public in the context of New York City even though other cities such as Mexico City have them, too. In fact, they’re commonly thought of as cities themselves rather than as subsidiaries of New York City.

We don’t expect our users to understand or appreciate the hierarchical distinction in our data between boroughs and localities, so if a structured geocode request passed /v1/search/structured?locality=Manhattan&region=NY, Pelias will search boroughs along with localities.

county

Counties are not as commonly-used in geocoding as localities but can be useful when attempting to disambiguate between localities. For instance, there are 3 cities named Red Lion in Pennsylvania but only 1 in each of 3 counties. Specifying a county disambiguates this list to a single result.

region

Regions are normally the first-level administrative divisions within countries, analogous to states and provinces in the United States and Canada, respectively, though most other countries contain regions as well.

Regions in the United States have common abbreviations, such as PA for Pennsylvania and NM for New Mexico. The region parameter can be a full name or abbreviation, so specifying /v1/search/structured?region=NM is functionality equivalent to /v1/search/structured?region=New Mexico.

postalcode

Postal codes are used to aid in sorting mail with the format dictated by an administrative division (almost always countries). Among other reasons, postal codes are unique within a country so they’re useful in geocoding as a shorthand for a fairly granular geographical location.

Pelias doesn’t currently import postal codes, though addresses from OpenAddresses and OpenStreetMap are sometimes annotated with postal codes and used for scoring.

country

Countries are the highest-level administrative divisions supported by Pelias. In addition to full names, countries have common 2- and 3-letter abbreviations which are also supported values for the country parameter.

Caveats

Any combination of the above parameters can be sent as structured geocoding requests with the exception of postalcode-only as Pelias does not currently import postal codes as separate records, only as augmenting address data. For example, a request consisting only of /v1/search/structured?postalcode=87801 is not valid at this time and an error will be returned to the caller.

Fallback Behaviors

Structured geocoding, much like the single-input /v1/search endpoint, falls back to less granular geocodes if the exact input as specified returns no results. This topic has already been covered in The Next Chapter of Search so it won’t be covered here except for a quick recap.

A key concept of geocoding is to not return things other than what the user asked for. If the geocode request is for 14 Horseshoe Pond Lane, Concord, New Hampshire and that address neither is a point in the data nor can it be interpolated, a geocoder shouldn’t return something that’s close like 16 Horseshoe Pond Lane, Concord, New Hampshire. The geocoder should fall back to the most granular level available. If a street result for Horseshoe Pond Lane, Concord, New Hampshire is in the data, return that. Otherwise return Concord, New Hampshire or even New Hampshire as a last resort.

Who’s On First Layer Mappings

This section is for people who are well-versed in the nuances of Who’s on First place types in or have spent a bit of time looking at data in it.

As stated previously, we don’t expect our users to understand the complexities of Who’s on First layer mappings. While there are very good reasons why our gazetteer supports both locality and localadmin, it would be pretty cumbersome to include both as parameters, so we have added some convenience mappings to make structured geocoding easier:

For example, Peach Bottom, Pennsylvania is only a localadmin place type and not a locality in Who’s on First, but we don’t expect the user to know the distinction, so if a structured geocoding request specifies locality=Peach Bottom&region=Pennsylvania, then Pelias will lookup Peach Bottom in both the locality and localadmin layers.

Get in touch!

Check out the structured geocoding documentation, and if you have any questions, concerns, enhancement requests, or bug reports, please don’t hesitate to file an issue! Get more get more geocoding with less work!

Note: This post was updated on May 31, 2017 to update the API request links. In addition, since this post was written, postalcode searches are supported.