Trying to find useful things to do with emerging technologies in open education and data journalism

Trying Out Spatialite Support in Datasette

A recent datasette release included improved support for the SpatiaLite geo-extensions for SQLite, so to complement some other bits of geo-tinkering I’ve been doing lately – and which I still need to blog – I thought I’d give it a spin.

The write up is in a Jupyter notebook which you can find here and which is embedded in markdown export format below:

SpatiaLite Datasette Demo

This notebook provides a quick walkthrough of getting started with a SpatiaLite geo-database and using it with Datasette.

Get the Data

The SpatiaLite database can be used to store, index and perform various geo-related query operations on various geographical objects including points and shapefiles.

To help me get up to speed, I’m going to try to load in a shapefile delimiting various bits of the UK into a SpatiaLite database, publish it as a datasette, retrieve a boundary for a particular region from the datasette API and then plot the boundary on a map.

The shapefile I’m going to use is of UK administrative areas described by the Ordnance Survey Boundary-Line product.

Loading Shapefile Data into SQLite

The datasette docs currently suggest creating a SpatiaLite database using a command of the form spatialite DATABASE.db and then loading shapefiles into it using a SpatiaLite commandline command of the form: .loadshp SHAPEFILE TABLENAME CP1252 23032.

That bit of voodoo actually unpacks a bit further as:

.loadshp PATH_AND_SHAPEFILE_NAME TABLENAME FILE_ENCODING [PROJECTION]

The CP1252 file encoding is for the default Microsoft Windows Latin-1 encoding, so I’m guessing that if your shapefile use another encoding you need to change that. (Internally, SpatiaLite uses UTF-8. UTF-8 is also acceptable as a file encoding value in the above commandline command; a full list of acceptable values can be found by trying to load a shapefile using spatialite_gui.) The file encoding is a required argument, as are the path to the shapefile name and the table name. The projection is optional.

The projection relates to the projection used within the shapefile. Setting this correctly allows you to transform the data to other projections, such as the WGS84 (aka EPSG:4326, GPS, latitude / longitude) projection.

The projection is identified using its SRID (Spatial Reference System Identifier) code (lookup). If no code is provided, no projection is identified with the shapefile geodata in the database (it’s give a -1 code). (I had expected the projection to be identified from the .prj (projection) file which contains a WKT description of the projection used in each .shp shapefile.)

I didn’t meet with much success looking for a Python library to identify the required SRID from the Ordnance Survey shapefiles, but did find an online API that seemed to do the trick:

#Load in the shapefile projection details
with open('./Data/GB/district_borough_unitary_region.prj', 'r') as myfile:
prj=myfile.read()
prj

The Northings/Eastings projection used by the OS shapefile is not directly plottable using many simple interactive web maps. Instead, they need GPS style latitude/longitude co-ordinates. Given that we know the projection of the original shapefile, we can create a new set of transformed co-ordinates using the required lat/long (WGS84) projection.

Let’s create a simple text file containing a SQL script to handle that transformation for us:

Accessing the SpatiaLite database Using Datasette

Having created our database and generated a projection appropriate for plotting using an interactive web map, let’s publish it as a datasette and then see if we can access the data from the datasette API.

Here’s a trick I just learned via a tweet from @minrk for running a simple web service from within a Jupyter notebook code cell without blocking the notebook execution: