The Sweet Java Topology Suite – Part II

In a previous post, we described how we started using the Java Topology Suite (JTS) to manipulate postal/zip code polygons that we are viewing in an application built on MapQuest’sFlexAPI. Since then, we have added the ability to join multiple postal codes into territories. Sometimes over 1,000 postal code polygons will be combined to form a single territory.

We ran into two significant technical hurdles. First, MapQuest’s API doesn’t support polygons with inner holes. So, a donut-shaped polygon would just look like a circle, with no hole in the middle. The other problem was that some of the postal codes were so complicated that the unify process would fail.

Union of postal code polygons with a hole in the middle

Union of postal code polygons, missing the hole in the middle

If you read the other article, you saw that we did use JTS to simplify polygons (by reducing the number of points that make up the polygon). However, we didn’t end up using those in production because the edges of the simplified polygons would not line up. They end up looking like broken glass, because the simplify process had no regard for adjacent polygon edges.

Simplified polygons with edges that don't line up

So, we set out on an adventure to simplify the polygons so that the edges of the simplified postal codes matched up. We received some very responsive and helpful guidance from Martin Davis, one of the principle developers of JTS. He also pointed us to the open source tool OpenJUMP, which he also helped to build. Source code from that tool was very helpful as we created our own automated simplification process.

Here’s the simplification process in a nutshell:

Convert the MapQuest postal code polygon data for the current patch (like the lower 48 states) to Well-Known Text (WKT) and save each postal code polygon to an individual file on the file system. For the lower 48 states, this resulted in more than 41,000 files. Here is an unsimplified version of the few polygons we’ll simplify in this example:

Original, unsimplified postal code polygons

Read all of the WKT files, one per postal code, and store them as JTS Geometry objects in a collection. To support step six (below), we store the postal code in the geometry object using the very handy Geometry.userData property. That way, each original/source geometry remembers what postal code it represents.

Use JTS to convert the polygons to merged LineString objects. This creates a collection of the outlines of every polygon, where the common polygon edges become a single line.

Extracted border lines of original polygons

Use JTS to simplify the merged LineStrings by reducing the number of coordinates that define each line. Our code iterates across every merged LineString and uses JTS’s DouglasPeuckerSimplifier with a simplify tolerance of 0.01.

Simplified polygon border lines

Use JTS to create polygons from the simplified LineStrings. The primary JTS class was the magic Polygonizer class, along with code from OpenJUMP that prepared the line data for the Polygonizer.

New polygons made from simplified lines

Now the tough part. We have a collection of simplified polygons, but they aren’t linked to any postal codes, so we can’t find the polygon and use it in our application. We needed to match the simplified polygon with the original. Since this is among the most involved processes, I’ll describe it in a bit more detail:

Add each of the original polygons to a JTS SpatialIndex called STRtree. The STRtree provides a quick query interface to find polygons that fall within a spatial constraint.

Iterate through each of the simplified polygons, and:

Query the STRtree to find all of the original polygons that touch the envelope (bounding rectangle) of the current simplified polygon.

Find the polygon in that set which has the smallest distance between its center point and the simplified polygon’s center point.

Once the best matching simplified polygon is found, we copy the postal code from the original Geometry’s userData.

Some simplified polygons have no match in the original set because of holes, so those non-matches are thrown out in this process.

Now that each simplified polygon has been identified as matching a postal code, we write new WKT files for each postal code. Our code that writes these files automatically creates MultiPolygon objects for those postal codes that are made up of more than one polygon.

Simple polygons that remain after match with originals

In order to run this process on the lower 48 United States, I had to allocate 7GB of my 8GB of RAM to the JVM so that all 41,000 polygons could be simplified at the same time. Fortunately, it’s worth the time to build. Here are the number of coordinates needed to represent all of the polygons for the three areas, both originally and after simplification, along with the savings realized:

Coordinate Count

Original

Simplified

Reduction

Lower 48 United States

6,276,000

544,000

12x smaller

Alaska

262,000

15,000

17x smaller

Hawaii

72,000

960

75x smaller

Here’s a larger area of polygons, before and after simplification:

Original postal code polygon sample

Simplified postal code polygon sample

In order to create polygons that maintain any holes in the middle with MapQuest’s polygon API, we used JTS to cut a small slice between any inner features and the exterior of the polygon. This leaves a line in the middle of the polygon, but it’s more acceptable than no hole at all. Hopefully MapQuest will support polygons with inner holes in a later release. In fact, it would be really cool if MapQuest would incorporate other structures and features from JTS, including native WKT support.

Simplified postal codes on map

Territory on map with hole enabled by slice

We are very grateful for the Java Topology Suite and the polygon processing it allowed us to complete. The project we’re building for Dave’sEndorsed Local Provider program will be much more successful with these improvements.

Like this:

LikeLoading...

Related

This entry was posted on March 4, 2009 at 4:28 pm and is filed under Projects. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

@leon: (nice to see your name) Yes, we store the polygons in the database, as gzipped XML representations of MapQuest FeatureCollection objects, so that the Flex client can read them. We also keep the WKT in the database as a reference.

@Steven: I created an ant script that invokes all the code to do the process. It takes about 8 minutes on my box to process all of the lower-48 USA postal codes.

[…] points) so that they can be used in a webapp that we use to maintain service provider territories (more details here). Due to the nature of how the graph algorithms work, we have to load the entire US (48 states) […]

Thanks for the post. I’m currently facing a similar challenge (simplified country boundaries), albeit complicated by the fact that country geometries are multi-polygons with lots of small, pesky islands. This presents a challenge since, in Step 6, the centroid for each island wouldn’t necessarily correlate with the original centroid for the overall country. Any ideas about how to address this?