Making thematic maps has traditionally been the preserve of a ‘proper’ GIS, such as ArcGIS or QGIS. While these tools make it easy to work with shapefiles, and expose a range of common everyday GIS operations, they aren’t particularly well-suited to exploratory data analysis. In short, if you need to obtain, reshape, and otherwise wrangle data before you use it to make a map, it’s easier to use a data analysis tool (such as Pandas), and couple it to a plotting library. This tutorial will be demonstrating the use of:

Descartes, which turns said geometric objects into matplotlib “patches”

PySAL, a spatial analysis library

The approach I’m using here uses an interactive REPL (IPython Notebook) for data exploration and analysis, and the Descartes package to render individual polygons (in this case, wards in London) as matplotlib patches, before adding them to a matplotlib axes instance. I should stress that many of the plotting operations could be more quickly accomplished, but my aim here is to demonstrate how to precisely control certain operations, in order to achieve e.g. the precise line width, colour, alpha value or label position you want.

Package installation

This tutorial uses Python 2.7.x, and the following non-stdlib packages are required:

IPython

Pandas

Numpy

Matplotlib

Basemap

Shapely

Fiona

Descartes

PySAL

The installation of some of these packages can be onerous, and requires a great number of third-party dependencies (GDAL&OGR, C &FORTRAN77 (yes, really) compilers). If you’re experienced with Python package installation and building software from source, feel free to install these dependencies (if you’re using OSX, Homebrew and/or Kyngchaos are helpful, particularly for GDAL&OGR), before installing the required packages in a virtualenv, and skipping the rest of this section.

For everyone else: Enthought’s Canopy (which is free for academic users) provides almost everything you need, with the exception of Descartes and PySAL. You can install them into the Canopy User Python quite easily, see this support article for details.

Running the Code

I find IPython Notebook best for this: code can be run in isolation within cells, making it easy to correct mistakes, and graphics are rendered inline, making it easy to fine-tune output. Opening a notebook is straightforward: run ipython notebook --pylab inline from the command line. This will open a new notebook in your browser. Canopy users: the Canopy Editor will do this for you.

PEP8

Some (16, to be exact) of my lines are over-long. I know. I’m sorry.

Obtaining a basemap

We’re going to be working with basemaps from Esri Shapefiles, and
we’re going to plot data on a map of London. I’ve created a shapefile for this, and it’s available in .zip format here, under Crown Copyright. Download it, and extract the files into a directory named data, under your main project directory.

Obtaining some data

We’re going to make three maps, using the same data: blue plaque locations within London. In order to do this, we’re going to extract the longitude, latitude, and some other features from the master XML file which is available from from Open Plaques. Get it here. This file contains data for every plaque Open Plaques knows about, but it’s incomplete in some cases, and will require cleanup before we can begin to extract a useful subset.

Extracting and cleaning the data

Let’s start by importing the packages we need. I’ll discuss the significance of certain libraries as needed.

tree=etree.parse("data/london_20131229.xml")root=tree.getroot()output=dict()output['raw']=[]output['crs']=[]output['lon']=[]output['lat']=[]foreachinroot.xpath('/openplaques/plaque/geo'):# check what we got backoutput['crs'].append(each.get('reference_system'))output['lon'].append(each.get('longitude'))output['lat'].append(each.get('latitude'))# now go back up to plaquer=each.getparent().xpath('inscription/raw')[0]ifisinstance(r.text,str):output['raw'].append(r.text.lstrip().rstrip())else:output['raw'].append(None)

This will produce a dict containing the coordinate reference system, longitude, latitude, and description of each plaque record. Next, we’re going to create a Pandas DataFrame, drop all records which don’t contain a description, and convert the long and lat values from string to floating-point numbers.

I’ve chosen the transverse mercator projection, because it exhibits less distortion over areas with a small east-west extent. This projection requires us to specify a central longitude and latitude, which I’ve set as -2, 49.

# set up a map dataframedf_map=pd.DataFrame({'poly':[Polygon(xy)forxyinm.london],'ward_name':[ward['NAME']forwardinm.london_info]})df_map['area_m']=df_map['poly'].map(lambdax:x.area)df_map['area_km']=df_map['area_m']/100000# Create Point objects in map coordinates from dataframe lon and lat valuesmap_points=pd.Series([Point(m(mapped_x,mapped_y))formapped_x,mapped_yinzip(df['lon'],df['lat'])])plaque_points=MultiPoint(list(map_points.values))wards_polygon=prep(MultiPolygon(list(df_map['poly'].values)))# calculate points that fall within the London boundaryldn_points=filter(wards_polygon.contains,plaque_points)

Note that the map_points series was created by passing longitude and latitude values to our Basemap instance, m. This converts the coordinates from long and lat degrees to map projection coordinates, in metres.
Our df_map dataframe now contains columns holding:

a polygon for each ward in the shapefile

its description

its area in square metres

its area in square kilometres

We’ve also created a prepared geometry object from the combined ward polygons. We’ve done this in order to speed up our membership-checking operation significantly. We perform the membership check by creating a MultiPolygon from map_points, then filtering using the contains() method, which is a binary predicate returning all points which are contained within wards_polygon. The result is a Pandas series, ldn_points, which we will be using to make our maps.

The two functions below make it easier to generate colour bars for our maps. Have a look at the docstrings for more detail – in essence, one of them discretises a colour ramp, and the other labels colour bars more easily.

Let’s make a scatter plot

# draw ward patches from polygonsdf_map['patches']=df_map['poly'].map(lambdax:PolygonPatch(x,fc='#555555',ec='#787878',lw=.25,alpha=.9,zorder=4))plt.clf()fig=plt.figure()ax=fig.add_subplot(111,axisbg='w',frame_on=False)# we don't need to pass points to m() because we calculated using map_points and shapefile polygonsdev=m.scatter([geom.xforgeominldn_points],[geom.yforgeominldn_points],5,marker='o',lw=.25,facecolor='#33ccff',edgecolor='w',alpha=0.9,antialiased=True,label='Blue Plaque Locations',zorder=3)# plot boroughs by adding the PatchCollection to the axes instanceax.add_collection(PatchCollection(df_map['patches'].values,match_original=True))# copyright and source data infosmallprint=ax.text(1.03,0,'Total points: %s\nContains Ordnance Survey data\n$\copyright$ Crown copyright and database right 2013\nPlaque data from http://openplaques.org'%len(ldn_points),ha='right',va='bottom',size=4,color='#555555',transform=ax.transAxes)# Draw a map scalem.drawmapscale(coords[0]+0.08,coords[1]+0.015,coords[0],coords[1],10.,barstyle='fancy',labelstyle='simple',fillcolor1='w',fillcolor2='#555555',fontcolor='#555555',zorder=5)plt.title("Blue Plaque Locations, London")plt.tight_layout()# this will set the image width to 722px at 100dpifig.set_size_inches(7.22,5.25)plt.savefig('data/london_plaques.png',dpi=100,alpha=True)plt.show()

We’ve drawn a scatter plot on our map, containing points with a 50 metre diameter, corresponding to each point in our dataframe.

This is OK as a first step, but doesn’t really tell us anything interesting about the density per ward – merely that there are more plaques found in central London than in the outer wards.

Creating a Choropleth Map, Normalised by Ward Area

df_map['count']=df_map['poly'].map(lambdax:int(len(filter(prep(x).contains,ldn_points))))df_map['density_m']=df_map['count']/df_map['area_m']df_map['density_km']=df_map['count']/df_map['area_km']# it's easier to work with NaN values when classifyingdf_map.replace(to_replace={'density_m':{0:np.nan},'density_km':{0:np.nan}},inplace=True)

We’ve now created some additional columns, containing the number of points in each ward, and the density per square metre and square kilometre, for each ward. Normalising like this allows us to compare wards.

We’re almost ready to make a choropleth map, but first, we have to divide our wards into classes, in order to easily distinguish them. We’re going to accomplish this using an iterative method called Jenks Natural Breaks.

We’ve calculated the classes (five, in this case) for all the wards containing one or more plaques (density_km is not Null), and created a new dataframe containing the class number (0 - 4), with the same index as the non-null density values. This makes it easy to join it to the existing dataframe. The final step involves assigning the bin class -1 to all non-valued rows (wards), in order to create a separate zero-density class.

Finally, we can create an alternative map using hex bins. These are a more informative alternative to point maps, as we shall see. The Basemap package provides a hex-binning method, and we require a few pieces of extra information in order to use it:

the longitude and latitude coordinates of the points which will be used must be provided as numpy arrays.

We have to specify a grid size, in metres. You can experiment with this setting; I’ve chosen 125.

setting the mincnt value to 1 means that no bins will be drawn in areas where there are no plaques within the grid.

You can specify the bin type. In this case, I’ve chosen log, which uses a logarithmic scale for the colour map. This more clearly emphasises minor differences in the densities of each bin.

The code:

# draw ward patches from polygonsdf_map['patches']=df_map['poly'].map(lambdax:PolygonPatch(x,fc='#555555',ec='#787878',lw=.25,alpha=.9,zorder=0))plt.clf()fig=plt.figure()ax=fig.add_subplot(111,axisbg='w',frame_on=False)# plot boroughs by adding the PatchCollection to the axes instanceax.add_collection(PatchCollection(df_map['patches'].values,match_original=True))df_london=df[(df['lon']>=ll[0])&(df['lon']<=ur[0])&(df['lat']>=ll[1])&(df['lat']<=ur[1])]lon_ldn=df_london.lon.valueslat_ldn=df_london.lat.values# the mincnt argument only shows cells with a value >= 1# hexbin wants np arrays, not plain listshx=m.hexbin(np.array([geom.xforgeominldn_points]),np.array([geom.yforgeominldn_points]),gridsize=125,bins='log',mincnt=1,edgecolor='none',alpha=1.,lw=0.2,cmap=plt.get_cmap('Blues'))# copyright and source data infosmallprint=ax.text(1.03,0,'Total points: %s\nContains Ordnance Survey data\n$\copyright$ Crown copyright and database right 2013\nPlaque data from http://openplaques.org'%len(ldn_points),ha='right',va='bottom',size=4,color='#555555',transform=ax.transAxes)# Draw a map scalem.drawmapscale(coords[0]+0.08,coords[1]+0.015,coords[0],coords[1],10.,barstyle='fancy',labelstyle='simple',fillcolor1='w',fillcolor2='#555555',fontcolor='#555555',zorder=5)plt.title("Blue Plaque Density, London")plt.tight_layout()# this will set the image width to 722px at 100dpifig.set_size_inches(7.22,5.25)plt.savefig('data/london_plaques.png',dpi=100,alpha=True)plt.show()

You can view and download a working copy of the code using the IPython Notebook Viewer here. Note that you’ll have to have a subdirectory named data, containing both the XML data source and Shapefile in order to run it.