Geofile: Getting Started with pgRouting using Esri Shapefiles

We’re going to show you how to plan your travel around Seattle’s museums and sights, building on our introduction to pgRouting.

Today, we are going to look at how to upload an Esri shapefile of Seattle streets, layer on a set of locations throughout Seattle, and then find the shortest paths to those points using the street network.

In the previous article, getting started with pgRouting, we covered everything from adding the extension to Compose PostgreSQL, creating a data set of nodes, connecting them together, then finding the shortest path using the pgRouting routing functions pgr_dijkstra and pgr_ksp. Now, we'll build on that.

Importing an Esri shapefile

We talked about importing shapefiles into PostgreSQL in Converting and importing shapefiles for Compose PostgreSQL and MongoDB. That article gave a general overview on how to import shapefiles into PostgreSQL using the command line tool shp2pgsql, which comes with PostGIS. We'll use it again, so if you don't already have PostGIS installed and you're using macOS with homebrew, you can install it using the terminal with:

brew install postgis

Directions on how to install PostGIS on other platforms are found here.

The shapefile we'll be using is the City of Seattle street network. It's a large ZIP file that includes all the streets within the Seattle area. Once you've downloaded it, open the ZIP file, and in it, you'll have two folders named "StatePlain" and "WGS84". We'll use the shapefile included in "WGS84". Why? Because it's already set to SRID 4326, which is a frequently used SRID for geographic data.

Now, we'll load the shapefile into PostgreSQL using the shp2pgsql tool. The syntax we'll use is the following:

Let's go over the options we passed to shp2pgsql. The two most common options that are frequently used are the -I and -s switches. The -I switch automatically sets up the index for the geometry column, which is important for any geospatial data set, and the -s switch sets the SRID to 4326. Our shapefile already is set to SRID 4326, but we'll use the flag as a precaution.

The next two options focus on converting the geometry of the data set. The -S switch converts any multilinestrings to linestrings. pgRouting is known to have difficulties with multi-geometries and will only consider the first linestring in a multilinestring so it's best to just do the conversion to avoid any issues that might arise. We'll also use the -t switch to set the dimensions of the streets to "2D", or two dimensions. The reasoning behind this is that sometimes Z or M coordinates are added to indicate three-dimensional data for underpasses, intersections, etc. This can lead to data that's not noded correctly leading to problems when creating a routing topology. Therefore, by setting the shapefile dimension to "2D", we're solving this problem.

The last parts of the command are the shapefile and an optional name of the table that will be created that'll contain the shapefile data, we've called ours "streets". That name is optional. By default shp2pgsql will create a table using the shapefile file name.

For the psql command, you'll have to substitute it by choosing one of your Compose PostgreSQL portals. You can find them on your PostgreSQL deployment's Overview page in the Compose console.

Once we've executed shp2pgsql it will take a couple minutes for the shapefile to be converted and imported into PostgreSQL (depending on your internet connection). When it's finished, you'll have a table named "streets" in your database.

Now, let's start exploring the table and preparing the data for pgRouting ...

Preparing shapefiles for pgRouting

We have a table called "streets" inside our database so we'll now need to set up the table to use it with pgRouting. In the last article, we mentioned that tables containing routes, called edges, need to have "source", "target", and "cost" columns (or a column that will hold a cost value) in order to know where nodes begin and end and a meaningful cost to each edge.

Let's set up three columns "source", "target", and "length" (for cost). We can do that with these SQL commands:

Next, we'll populate the "source" and "target" columns using pgr_createTopology. In the last article, we populated these columns ourselves since we created the nodes and edges. With the imported street data, the edges and nodes have been created for us so we'll let pgRouting figure out the source and target nodes by running:

SELECT pgr_createTopology('streets',0.000001,'geom','gid');

This function takes the table name "streets", a tolerance number, the name of the geometry column, and the name id field. We're using the names geom and gid in the function because the default column names the function looks for are id and the_geom. As for the tolerance number, it should be as low as possible - the lower the better. The tolerance should be lower than the distance between nodes; otherwise, the topology function will think that some nodes are connected and they'll be snapped together and we don't want that otherwise some streets will be connected at the wrong intersections.

Running this function, the "source" and "target" columns will be populated with the source and target nodes for each edge. Additionally, a table called "streetsverticespgr" will be created which includes the id and geometry of each node in the "streets" table. This table serves as a lookup for each node that pgRouting goes to when finding the locations of the nodes.

The next column in the "streets" table that needs to be populated is the "length" column. This serves as the cost of each edge. For the length, we'll use the edge length between the source and target nodes using the geom column from streets. To do that, we'll let PostGIS determine the length of each street segment using ST_Length and casting the geom column to the geography data type to get the street segment length in meters:

UPDATE streets SET length = ST_Length(geom::geography);

Once, we've done that, all the lengths will be inserted into the database and we'll be set up to do the routing.

What about locations to route to?

Technically, we don't need point locations to make pgRouting work; all you have to know are the source and target nodes for the origin and destination edges. That means we'll have to figure out what the source and target nodes are for each street using, for instance, some SQL to find the street name from the "streets" table, or by using a geospatial tool like QGIS to find the source and target node ids through a UI.

Instead of trying to figure out the nodes of each street, let's overlay some locations and use those as source and target nodes. How will we do that?

First, let's download a CSV file with the locations from a community-driven project by the City of Seattle called My Neighborhood Map. The CSV file has a lot of different points of interest, some of which will not relevant for our use case, such as traffic cameras. Next, create another table called "places".

Now we'll update the place_id column by using the "streetsverticespgr" which stored the locations of each node from "streets". The place_id column is where we'll snap the locations to the street nodes using the by associating each street node with a location. Using some SQL, we can search for street nodes that are within a very minimal distance from each location using the PostGIS functionST_DWithin and specifying an optional distance parameter with a distance of 0.001 meters. This function will populate the place_id column with the ids of edges that are within 0.001 meters from a location.

You'll find that some of the places in the table do not have an id associated with them. That's because they are not within the specified distance to a street node. In this case, we can delete them or just ignore them. To delete them execute:

DELETE FROM places WHERE place_id IS NULL;

Routing between locations

We have streets and locations all set up for routing and now we'll use pgRouting to get the shortest routes between locations. If you've read the previous article Getting Started with pgRouting then you'll be familiar with how to set up the routes. We'll be using the same pgRouting routing functions here: pgrdijkstra and pgrkps.

Let's say we want to go to the Museum of Flight and then we want to go to a concert at Benaroya Hall. Using pgr_dijkstra, we'll set up the the query with:

In this query, we're selecting the source and target nodes from their place_id using the common_name column to select them by name. Some names are repeated because they have two or more category names in city_feature. Therefore, for this data set, you might have to specify the city_feature, which we've done for Benaroya Hall; otherwise, you'll get the following error when running the above query:

ERROR: more than one row returned by a subquery used as an expression

Executing this query in the shell will give us a long list of nodes with the edge geometries. However, to give this query a little more context, we'll execute this query in QGIS using our "streets" and "place" tables giving us the route:

Using pgr_ksp, the set up will be similar to that of pgr_dijkstra within the function, but now we'll add an extra parameter to indicate how many routes we want to be returned to the destination. In the following example, we'll determine the three shortest paths from the Pacific Science Center to the Museum of History and Industry. Set up the query like:

Again, giving this some context we've put the query in QGIS, and the route looks something like:

You'll notice that the majority of all three routes overlap. We've labeled all three routes in the image above. Number one is the fastest route, number two take another street but merges into number one, and number three takes a small side street off of number one, which is barely visible, but then merges back to the same route as number one.

To get an idea about the distances each route covers, we'll use psql. If we modify the query a little to get only the locations that each route goes through and the number of kilometers they have to travel to get from the Pacific Science Center to the Museum of History and Industry, we'd substitute everything in the query except what's within the pgr_ksp function:

SELECT
k.path_id AS path_id,
p.common_name AS route,
MAX(CASE WHEN k.edge = -1 THEN k.agg_cost/1000 ELSE NULL END) AS "distance/km"
FROM
pgr_ksp(
...
) as k
INNER JOIN places AS p ON k.node = p.place_id
GROUP BY k.path_id, p.common_name, k.seq
ORDER BY k.seq;

We'll be able to see each place that the route travels past and the total distance in kilometers:

Get routing!

In the two articles that have covered pgRouting, we've only scratched the surface of what's available and how the extension can be used in your GIS applications. pgRouting is particularly interesting when using it with OpenStreetMap data, which we'll cover in the near future. Nevertheless, we encourage you to explore more and do more with pgRouting on Compose.

If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.