Geographic Data for D3 - from GeoJSON to TopoJSON

During this post, I will go through from the basics of GeoJSON and TopoJSON to comparing the difference and improvement from one to another and finally use simple examples to illustrate how to optimize the size of TopoJSON by Quantizing and Simplying without losing the quality of data visualization.

1. What is GeoJSON?

Based on 2015 IETF, the Internet Engineering Task Force, GeoJSON is defined as a JSON format for encoding data about geographic features. GeoJSON could represent a region of space (a Geometry), a spatially bounded entity (a Feature), or a list of Features (a FeatureCollection). GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a Geometry object and additional properties, and a FeatureCollection contains a list of Features. A Feature consists of Geometry and additional elements and a FeatureCollection is just an array of Feature objects.

1.1 Geometry

A Geometry object consists of a type and a collection of coordinates which defines the position of subject of type. The components start with simple units: Point for one dimension, LineString for two dimensions, and Polygon for three dimensions. The complications of GeoJSON are all based on any of these three types.

Point

Point is just a simple point defined by its coordinates of position by the convention order longitude and latitude.

{"type":"Point","coordinates":[0,0]}

LineString

LineString is the line with starting point and ending point.

{"type":"LineString","coordinates":[[0,0],[10,10]]}

Polygon

Polygon is more complicated than Point and LineString since it has shapes. There are two types of Polygons. One comes without holes.

All the seven types of Geometries, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection, are case-sensitive. The order convension of coordinates follow the longitude-latitude-elevation order.

1.2 Feature

A Feature is an object of collection of geometry and additional properties and both geometry and properties are required by Feature. Specifically, Feature will have type property with value Feature, geometry property as well as properties property.

1.4 Bounding Box

GeoJSON may have a member called “bbox”, bounding box which contains information on the coordinate range for its geometries, features or featurecollections. It follows the convension of longitude-latitude-elevation min-max order going from left, bottom, right to top counter-clockwise which defines the boundary of underlying geo-information.

2. TopoJSON

TopoJSON is an extension of GeoJSON which eliminates redundancy to allow geometries to be stored more efficiently.

According to TopoJSON Format Specification, it must contain a “type” member, usually “Topology”, a “objects” member, itself another object named “example”. Geometry object Point and MultiPoint must have a “coordinates” member while LineString, Polygon, MultiLineString and MultiPolygon must have a “arcs” memeber. Both “coordinates” and “arcs” are always an array. “bbox” is optional as well as “transform” which is used to construct “quantized” topology. I use the simple examples in the GeoJSON session to illustrate TopoJSON.

As we can find out, all TopoJSON counterparties have a “type” member with value “Topology”. The topology objects are all with “example” object and the differences start with it by different types of geometries. For Point and MultiPoint, they have both “coordinates” and “arcs” members although “arcs” is always null since the position information is carried over by “coordinates” while the rest LineString, Polygon, MultiLineString and MultiPolygon only have “arcs” member.

3. From Raw Data to TopoJSON

In reality, we need to create our own TopoJSON file for D3’s consumption from raw ShapeFile formats. I will go through steps borrowed from Bostock’s series of blogs 1, 2, 3 and 4, and Ændrew Rininsland’s another view.

To start with, we need install packages needed for data manipulation, which are shapefile for converting ShapeFile to GeoJSON, and topojson for converting GeoJSON to TopoJSON.

For just a quick check, the above two commands would suffice to convert raw shapefiles into TopoJSON file. If you check the size of each file, it is not hart to find out the TopoJSON is only about 70% of original GeoJSON file.

Usually, it is not optimal to take advantage of TopoJSON’s capability to meet different particular needs for D3. We will deep dive to test a few ways of optimizing the file convension.

First of all, we convert the raw data into newline-delimited features with one feature per line for human-beings easy to read and let us to use convenient ndjson-cli tool.

To start with, we first rely on the newline-delimited file to convert into TopoJSON for benchmarking.

Benchmarking TopoJSON:

Then, we can take this benchmarking TopoJSON file by quantizing and simplying.

Quantizing is basically reducing coordinate precision. It is implemented by topoquantize with option as numbers. Indicated by TopoJSONAPI, it is typically powers of ten. The bigger number is, the more precise.

Quantizing:

Simplying is basically reducing the number of nodes used to represent arcs. It is implemented by toposimplify by -p option. Opposite from topoquantize, the value should be from 0 to 1 and the smaller it is, the more precise. f just removes detached rings that are smaller than the simplification threshold after simplifying.