London’s Cycle Hire scheme has been a roaring success and continues to grow, with new stations being added all the time. This tutorial will produce a visualisation of journey times from the central point (well, approximately) of the bike station network to all other stations.

This is made possible by the provision of an open-access instance of OSRM by the lovely people at Mapzen. I won’t spend too much time on what OSRM is or how it works; suffice to say that it’s an open-source routing engine that uses OpenStreetmap, and that the Mapzen instance provides walking, cycling, and public transit routing data via HTTP. Hurrah!

Package installation

This tutorial uses Python 2.7.x, and the following non-stdlib packages are required:

IPython

Pandas

Numpy

Matplotlib

Basemap

Shapely

Fiona

Descartes

Requests

(The following is a cut-and-paste from a previous article – it all still applies)
The installation of some of these packages can be onerous, and requires a great number of third-party dependencies (GDAL&OGR, C &FORTRAN77 (yes, really) compilers). If you’re experienced with Python package installation and building software from source, feel free to install these dependencies (if you’re using OSX, Homebrew and/or Kyngchaos are helpful, particularly for GDAL&OGR), before installing the required packages in a virtualenv, and skipping the rest of this section.

For everyone else: Enthought’s Canopy (which is free for academic users) provides almost everything you need, with the exception of Descartes and PySAL. You can install them into the Canopy User Python quite easily, see this support article for details.

Obtaining a basemap

We’re going to be working with basemaps from Esri Shapefiles, and
we’re going to plot data on a map of London. I’ve created a shapefile for this, and it’s available in .zip format here, under Crown Copyright. Download it, and extract the files into a directory named data, under your main project directory. You’ll also need a basemap of the Thames. Get it here, also under Crown Copyright, and put in the data folder.

Obtaining some data

We’re going to need point data relating to the bike rental station locations. This is available from TfL to registered developers, provided as XML. I’ve made the November dataset available here. Save it to the data folder.

Creating Basemap instances

We’re going to create two Basemap instances for plotting: one of London (using the GLA boundary), and one of the Thames. First, we’re going to open our London shapefile, and get some data out of it, in order to set up our main basemap:

I’ve chosen the transverse mercator projection, because it exhibits less distortion over areas with a small east-west extent. This projection requires us to specify a central longitude and latitude, which I’ve set as -2, 49. Note that I’ve also created a Thames basemap, because we’re going to need the polygons in it later.

Some Utility Functions

Next, we’re going to define a function which retrieves journey data from the OSRM instance.

defquery_travel_time(start,end,method):""" Get a travel time back from MapZen's OSRM start, end: lon, lat tuples method: foot, car, bicycle returns travel time, in seconds TODO: bounds checking for coords """allowed=('foot','car','bicycle')ifmethodnotinallowed:raiseException("Unknown method. Must be one of %s. Christ."%', '.join(allowed))endpoint='http://osrm.mapzen.com'method='/{m}/viaroute'.format(m=method)# should be properly encoding second loc, but dict keys are unique!# reverse lon, lat because ugh params={'loc':'{1},{0}&loc={3},{2}'.format(*chain(start,end))}req=requests.get(endpoint+method,params=params)try:req.raise_for_status()exceptrequests.exceptions.HTTPError:returnnp.nanifreq.json()['status']==207:returnnp.nanreturnreq.json()['route_summary']['total_time']

Instead of just falling over, this function will return numpy NaN values for missing data. This is a bit easier to work with in Pandas.

We now have enough data to perform some elementary calculations. For instance, we can calculate each station’s distance from our fake centroid:

# calculate station distance from centroid using Pythagorean Theoremstations['centroid_distance']=stations.apply(lambdax:math.sqrt(((abs(centroid[0]-x['projected_lon'])**2)+(abs(centroid[1]-x['projected_lat'])**2))),axis=1)

Retrieving OSRM Data

Here, we’re defining a simple function which we’re going to apply to our DataFrame. It simply calls our OSRM-retrieval function (defined earlier) with the specified column value and travel mode for each row in our DataFrame. Because OSRM expects lon/lat values, we have convert them back from projected coordinates using the inverse=True flag. Finally, we divided the retrieved times by 60, because travel times in minutes are more easily understood.

deftravel_time(df,start):""" return travel times between a given centroid and all stations in the network """returnquery_travel_time(start,(df['lon'],df['lat']),'bicycle')stations['travel_time']=stations.apply(travel_time,args=(m(*centroid,inverse=True),),axis=1)# travel time in minutes is more usefulstations['travel_time']/=60.

Bear in mind that Mapzen are providing this OSRM instance free of charge, and that this is quite a substantial operation, involving several hundred queries.

Thus, I’ve provided a CSV of retrieved data, which you can use instead. Save it to the data folder, and run this line of code instead of the previous cell:

stations=pd.read_csv('data/stations_travel_time.csv',index_col=0)

Let’s do a little post-processing

# remove any empty valuesstations=stations.dropna()# replace travel time of < 1 with 1. minutesstations.loc[stations['travel_time']<=1.,'travel_time']=1.# this station is closest to the centroidstations.iloc[stations['centroid_distance'].idxmin()]

Plotting the Data

There’s no point in pretending that precise plotting using Matplotlib is easy. Luckily, I’ve done it for you.

What we’re doing here is conceptually simple: We’re plotting a histogram of the journey times, and then imposing a colour map onto that histogram. The colourmap is normalised by the journey time, using a diverging colour map (meaning intermediate values have a neutral colour; low values are cool, high values are hot).
We then plot our London shapefile on an inset axis, before making a scatter plot of the bike station locations on this inset map. We then impose the same colour map defined earlier upon the scattered points.
Finally, we perform a set-theoretic intersection on the map with our river polygon, in order to make the Thames properly transparent.