The case for topology

Simple feature representations
of polygon geometries are ubiquitous due to their ease of use.
Thinking of spatial features as having a single, independent geometry is easy and fits most use cases.
But that ease of use disappears when we need to represent the topological
relationship between features.

In this article, I'll focus on one particular task with simple features data that
would benefit from topology - namely simplifying a polygon dataset by removing vertices.
Here's the original dataset, a 30+MB shapefile with very dense line work.

Geometries can be simplified under the Simple Features model but, since
each geometry is processed independently, the topological relationships between
features can be disrupted. For instance, using the Simplify Geometries tool in
QGIS, I can simplify the polygons dramatically but we see gaps between polygons
and other side effects.

The plan

Because we'll need to build topology before acting on it, the process for simplifying
simple features datasets involves converting the data to topological structure,
simplifying it, then converting it back to a simple features representation.

Many of the big GIS systems (ESRI's .e00, ArcInfo "coverages", and GRASS vectors)
have their own topological data structures. More recently, we've seen the rise of
Open Street Map (OSM) format and TopoJSON, both of which model topological relationships.

Of these options, I selected TopoJSON
because of it's robust command-line tool
which handles building topology and simplification in one step. Additionally, it
works with GeoJSON and Shapefile inputs, two of the most common
data formats for simple features.

The workflow goes something like this:

Convert data into a shapefile with the EPSG:4326 spatial reference (lonlat, wgs84)

Convert to topojson and simplify

Convert to geojson

Optionally, convert geojson to other formats supported by OGR

To follow along, you'll need to have the following software installed:

GDAL command line utilities (we'll use ogr2ogr at the command line)

apt-get install gdal-bin

The topojson command line utility

npm install -g topojson

Python with the shapely package installed.

pip install shapely

Step 1: Convert to WGS84 shapefile

If you're already working with an ESRI Shapefile or GeoJSON format and your data
is already in unprojected WGS84 coordinates (i.e. EPSG:4326), you can skip to step 2.

Use a proportion of points, 0 to 1, to retain with the --simplify-proportion flag

One quirk of the topojson implementation is that it uses a relatively low quantization factor by default.
Effectively, this snaps coordinates to a grid in order to save space and simplify geometries even further.
This yields nice small coordinates but can result in a "stair step" effect at higher
zoom levels. The default is -q 1E4 but I've found good results with -q 1E6 as
recommended in the topojson docs.

As an example, let's take our ecoregions_original.shp and convert it to topojson
with a tolerance of 1E-8 steridians. We want to make sure we explicitly mention
that the data is in spherical (unprojected) coordinates and to retain the properties
of the original attribute table:

Step 3: Convert to GeoJSON

This part was a bit trickier than I anticipated. Luckily Sean Gillies has written
some preliminary python functions
for converting topojson geometries to standard GeoJSON-like python dictionaries.

In order to make a higher-level conversion utility, I started working on topo2geojson.py which provides a command line
interface to perform TopoJSON to GeoJSON conversions.

python topo2geojson.py temp.topojson ecoregions_simple.geojson

There is some additional logic to ensure validity of polygons though it is very
basic and I'm sure there are ways to make the geometry conversions more robust.
Please note that I've only tested this script on this one dataset and it likely needs
additional work to be considered a full-fledged conversion tool; consider it more of a
starting point than an out-of-the box solution.

Optional Step 4: Convert to any OGR format

Once data is in GeoJSON format, we're free to do what we want with it, including
converting it back to a shapefile or any other OGR supported data format.

Case study: evaluating simplification tolerances

In the remainder of this article, I'll walk through a demonstration of these steps
in order to find an optimal simplification tolerance for my test data. The optimal
tolerance depends on your needs, what scales you will be using your data and how
aggressively you need to reduce file size. Ultimately, it's a tradeoff between
low geometry size and accurate line work.

We can easily script this solution in order to test multiple simplification tolerances.
As a bonus, we can fire off multiple iterations at once to leverage multiple cores.
Since I've got 4 cores on my laptop, I can run 4 processes in nearly the same time
it takes to run 1 using some simple shell tricks (Linux/OSX only; sorry Windows users but I don't know .bat files well enough to demonstrate)

OK, so with a simplification tolerance of 1E-10 steridians, we can get a 4.5M file.
If we reduce it to 1E-7, we can get 362k file - a 12.5x reduction. Is the reduction
in file size worth the reduction in geometric accuracy? The only way to find out is to
render maps of the resulting datasets and visually assess them.

Original

1E-7

1E-8

1E-9

First thing that we notice - all of the results have retained topology with no gaps or slivers introduced.
(the key benefit to this workflow).

Next, we notice that at this scale (roughly 1:500k on my monitor) we can barely
see a difference between the 1E-9 version and the original. And the 1E-7 version
looks a bit too simplified and chunky. So, in this case, we can say that a simplification
tolerance of around 1E-8 steridians is an optimal balance of file size and detail.

Of course other datasets, scales and uses may have completely different results so please try
it out and let me know how it goes. Just don't settle for simple features simplification next time you need to
reduce file sizes!