Marc Hibbinshttp://blog.marchibbins.com
Freelance Web developer, bloggerMon, 10 Mar 2014 09:21:05 +0000en-UShourly1http://wordpress.org/?v=3.4.1Simple Thingshttp://blog.marchibbins.com/2012/08/27/simple-things/
http://blog.marchibbins.com/2012/08/27/simple-things/#commentsMon, 27 Aug 2012 17:17:42 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1858In my last post I put together a simple template server with Express running on Node for a current project. I only usually use Node for a few build tools and prefer Python on the back-end, so just for the sake of it, here’s a Python alternative.

Using Flask

Flask is a light web framework and very easy to get going with. It’s useful for putting together pages and URL routes with minimal set-up.

Working with data

Flask’s render_template method sends a view context (dictionary) to the templates, for example the message above. Jinja isn’t logic-less like Mustache, so we can use this context for conditionals, loops and filters.

Refactoring our Javascript code, we can use regular Python load some JSON:

I can’t say I’ve ever used this approach, for front-end (only) builds I’d just work with a few variables to switch in templates and likely have a shared dictionary for ‘global’ data, e.g. placeholder user info.

On larger projects with back-end work I’d go for Django and a full relational database, though here we’re using static data files.

As for static media files, CSS et al, Flask serves from a static folder which sits alongside our app file and templates directory — this doesn’t have to be specified in the application logic, unlike Express.

Of course, Flask can be used for fully featured applications, three examples:

]]>http://blog.marchibbins.com/2012/08/27/simple-things/feed/0Buildinghttp://blog.marchibbins.com/2012/08/21/building/
http://blog.marchibbins.com/2012/08/21/building/#commentsTue, 21 Aug 2012 21:39:36 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1830I’m currently working on a front-end build for a site whilst a second (external) team simultaneously develop the back-end. Those guys will later integrate the templates once everything is ready. It’s a fairly common scenario.

We’re in a good position with this project specifically, because although the server-side is a work in progress the developers have provided a full API specification that details all the data that the application will provide, in its entirety. Our work isn’t held up by any architectural decisions yet to be made – the spec is finalised, only the platform to serve data and render our views doesn’t exist yet.

Working from spec, we’re able to create accurate (dummy) data objects and render pages with a templating system to build a limited, realistic, navigable version of the site.

With our data objects in line with their schemas, plus agreeing on a templating system similar to their implementation, should minimise the integration period.

For templating we’re using Mustache, which has a number of server-side and client-side implementation options.

Working with LESS, Bootstrap or otherwise, I use the command-line compiler with a custom script to monitors file changes and automatically output the master CSS file as I go. This runs using Jake, a Javascript build tool for Node.js, which I also use to compile and minify Javascript files, plus a few other tasks.

We decided to build on the Node stack with a Javascript implementation of Mustache and a light web framework, Express, to serve our pages and provide URL routing.

Using Express

With Node installed, get Express with npm:

mkdir myapp
cd myapp
npm install express

Create a basic Hello World app in a text file, myapp.js:

var express = require('express'),
app = express();

app.get('/', function(req, res) {res.send('Hello World');
});

app.listen(3000);
console.log('Listening on port 3000');

Run with the following and visit localhost:3000 in your browser:

node myapp.js

To serve a template with data you’ll need to install a templating engine, create the view and update the app response. The default is Jade:

npm install jade

Create the view file in the default views directory, called simple.jade – print the data:

Node Toolbox

Restarting the app gets old fast, install nodemon globally to monitor the script for changes and the reload will be automatic:

npm install nodemon -g
nodemon myapp.js

Express will look at the view name extension to load the templating engine. To switch to another, e.g. Mustache, install and configure the app to use a different module. Not all engines provide the same method for Express to render templates, so Consolidate.js maps a bunch of popular options automatically.

For now, this only includes your globally-installed system-wide libraries, not those you’ve installed within the virtualenv environment.

To create a new “PyDev Django project” however, you’ll need to have Django installed globally (or otherwise configured in the above Python Path settings) so PyDev can see it. Right now ours isn’t, so we have an extra step.

Instead, we’ll create a “New PyDev project” (non-Django), add our virtualenv location containing the libraries in our local site-packages directory, then convert that to a Django Project once PyDev is satisfied we have the goods.

This method means we don’t have to install Django globally just for the sake of using this IDE.

To do this, from the File menu and New PyDev Project, I un-tick ‘Create src folder and add it to the PYTHONPATH’, instead selecting ‘Don’t configure PYTHONPATH (to be done manually later on)’.

Right-click the project folder, go to Properties and PyDev – PYTHONPATH and add a Source Folder pointing to your virtualenv site-packages. In this instance:

dev/env/lib/python2.7/site-packages

Having found Django, PyDev now let’s us convert this to a Django project. Right-click again and under the PyDev menu select Set as Django Project.

Now everything can be performed within Eclipse, rather than by the command line.

For example, to run the server we’ll add a Custom Command. Under the Django menu select Custom Command and add the following:

runserver --noreload

You may be asked to select which manage.py to run from, choose the one within your project, i.e. myproject/manage.py.

Note the --noreload option allows Eclipse to maintain control over the process, rather than spawning a new thread. This function usually allows the server to reload automatically when changes are made to your code, at your convenience.

]]>http://blog.marchibbins.com/2011/09/21/one-note/feed/0Here’s to Taking It Easyhttp://blog.marchibbins.com/2011/09/15/here%e2%80%99s-to-taking-it-easy/
http://blog.marchibbins.com/2011/09/15/here%e2%80%99s-to-taking-it-easy/#commentsThu, 15 Sep 2011 16:43:43 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1763I mentioned in my previous post that I borked my system meddling with Python. Having reset my workspace, I’ve now set up a solid system that makes handling projects and multiple development environments super simple.

The new set up easily handles multiple Python projects, without compatibility or version conflicts. The installation is equally straightforward.

Before switching to a desktop Linux, I used to sing the praises of VMware and developing with virtual machines when dealing with unique environments. By “unique”, I rather mean any odd project out of the ordinary LAMP set-up I usually work with, or something that requires a specific version of a piece of software.

Since then however, I’ve found no need. So long as you think before you leap.

Virtual boxes (as closed, single-piece software) are good and all, you can be as venturous as you wish without risk of damaging your native system. Plus, if you screw one of these you can restore a saved state in a few clicks. However, the VM safety net allows you to proceed without caution, perhaps recklessly, at the expense of fully comprehending the commands you’re executing and tasks you’re running.

In that sense, they’re great for beginners uncertain of how (or if) they should install software, e.g. Apache, PHP, Python etc — appliances and virtual stacks are helpful.

Otherwise they can convolute your workspace — and more often than not, won’t be configured exactly how you want or need them. Running software natively is simple and as intended, it also allows you to configure your entire environment without any assumptions made by distributors.

Virtualenv is quite the revelation. It facilitates multiple isolated Python environments on a single system, dynamically handling your Python Path so packages are install within an enclosed local directory, rather than in amongst your top-level system packages.

This means you can create project-by-project virtual environments avoiding compatibility and version conflicts. When an environment is created (and activated) libraries are thereafter installed within discreet directories that aren’t shared with other virtualenv environments.

This means nothing is installed “system-wide”, so libraries don’t accrue over time, there’s no balancing of versions. It also means you can work with different version of Python simultaneously.

Python packages should be installed with a package manager. The latest of which is pip.

Prior to this, easy_install was the manager du jour (part of Setuptools, both now out-dated), but we’ll only be using that to install pip:

$ sudo easy_install pip

Pip is a direct replacement for easy_install, improving on a few things (a comparison can be found on the installer site). Packages that are available with easy_install should be pip-installable and the installation method is the same — the following installs virtualenv:

$ sudo pip install virtualenv

With virtualenv installed we can create an environment within your workspace, all it needs is the environment directory name, here ‘env’:

$ virtualenv env

There are a few options you have with this command. In the following example, the --no-site-packages flag means that the new environment will not inherit any system-wide global site packages. The --distribute flag will install Distribute rather than Setuptools:

$ virtualenv --no-site-packages --distribute env

Distribute is to setuptools as Pip is to easy_install. Distribute and pip are the new hotness, Setuptools and easy_install are old and busted — for now.

Anyway, activate your environment:

$ source env/bin/activate

You’ll see from your shell prompt that the environment is activated, with the name prepended.

Then we’ll install something with pip. Yolk is a tool for querying the packages currently installed on your system, so we’ll install that and grab a list:

(env) $ pip install yolk
(env) $ yolk -l

Then you’ll see everything the environment can see in the output (this will depend on your global site packages and how you created the environment, as above).

Note that you don’t need to sudo whilst in the activated environment.

As a test, we’ll deactivate the environment and run the same command, which gets the following error (unless you have yolk installed globally):

(env) $ deactivate
$ yolk -l
yolk: command not found

If installed within an environment, a package is only available whilst it is activated. This is the means to install whatever you wish, without worrying about cross-project conflicts.

]]>http://blog.marchibbins.com/2011/09/15/here%e2%80%99s-to-taking-it-easy/feed/2Home Securityhttp://blog.marchibbins.com/2011/09/08/home-security/
http://blog.marchibbins.com/2011/09/08/home-security/#commentsThu, 08 Sep 2011 15:43:46 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1742Since deciding to work exclusively in a Linux environment at the beginning of the year, I’ve been more than pleasantly surprised not to have found myself needing to reset my system as a result of the frequent changes of set-up and numerous installations and removals of software that I’ve needed to perform in order to work on various projects.

The inevitable day, however, came a couple of weeks ago when I royally screwed my system messing around with Python (solution in another blog post. Update: here it is).

Once Ubuntu was reinstalled, I encountered a problem attempting to recreate my workspace having opted to encrypt my home directory during user setup.

Running the normal LAMP-server setup, Apache is unable to access files within the encrypted home.

I was tying to duplicate my previous configuration, using individual VirtualHosts locating directories within my user home, for example:

/home/marc/sites/dev/

I’m pretty sure my home directory was encrypted last time too, but this problem was new for me — perhaps something from an update in between?

The permissions problem occurs as only my user, marc, has access to the home and Apache’s user, www-data, does not. This results in a HTTP 403 Forbidden when attempting to serve files.

As for alternatives, you could encrypt your whole drive rather than just the home directory. You shouldn’t see any problems then.

Or you could just ignore encryption all together.

You could, of course, just work out of the traditional /var/www/ location, which is the Apache default. Simply create a directory there and chown to your user so you don’t have to always sudo changes.

sudo mkdir /var/www/dev/
sudo chown marc /var/www/dev/

If you’re directories are elsewhere on your system, for example in SVN repositories such as /srv/svn/ or /usr/local/svn/ then you’ll need to chown those to www-data so they’re readable, similar to our method of reading from within /home above.

Create a new user group, subversion, add the users marc and www-data to it and chown the repo to www-data:subversion, giving read/write access to the group (granting privileges to marc). Finally chmod with -s so that new files inherit that group ID, like so:

The -s flag means that all files created inside that directory will inherit the group of the directory, otherwise files takes on the primary group of the user. New subdirectories will also inherit this.

]]>http://blog.marchibbins.com/2011/09/08/home-security/feed/0As Long as You Followhttp://blog.marchibbins.com/2011/05/18/as-long-as-you-follow/
http://blog.marchibbins.com/2011/05/18/as-long-as-you-follow/#commentsWed, 18 May 2011 22:48:19 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1707The final part of Beats Per Mile worth mentioning is the Twitter integration.

These were at landmarks and certain spectator spots too, but also at course milestones — the first mile complete, halfway complete, one mile to go etc, as well as the start and finish.

The application monitored the elapsed distance and updated accordingly, grabbing the latest time and statistics from the RunKeeper data, as well as geolocating the tweet with the latest set of GPS coordinates.

This means rather than handing over your username and password to applications and trust (hope) that they’re friendly — as used to be the only way — OAuth allows applications to request access on your behalf without users ever parting with precious login credentials. You simple give, or deny, permission.

Twitter offer a number of links to OAuth libraries for various languages to make the this job a lot easier. There are many Twitter specific OAuth libraries in particular, purposely tailored for the API.

This is achieved with a cycle of exchanging authenticating tokens between application and Twitter to verify permission. TwitterOAuth particularly creates a session object in your application and rebuilds itself with each token exchange to remain contained in a single class instance.

On successful authorisation Twitter will return the user to your callback URL (set above) with a verification token. TwitterOAuth now rebuilds for the first time with the OAuth tokens in our session and uses the new verification token to get an a new access token which will grant us user account access:

// Test that everything is working
$connection->get(‘account/verify_credentials’);

The dance made a hell of a lot easier with a library such as this.

Usually applications store these tokens final two tokens, the user generated oauth_token and oauth_token_secret, which saves the need to authorise the user again.

Storing these details (in a session or database) means that a username and password need not be saved. The tokens are good until the user revokes access, no sensitive information is ever released to the application, all the user ever gives is their permission.

With access tokens stored, the connection to Twitter is a lot simpler — just create the TwitterOAuth object with those user-generated codes as in the very last step, without any of the redirecting to and from Twitter.com. Of course, those tokens could only have ever been obtained by carrying out the full process to begin with.

Beats Per Mile is a single-user case application, so Gemma only had to authenticate the application once and then we hard-coded them into the scripts.

With access granted the application was free to send out updates, based on the run data we were collecting and posting it directly.

As mentioned, locations translated into distances and that’s when we tweeted.

At mile twenty, it looked back at the mile splits so far and reported which were the fastest:

Relying on the total reported distance alone was flawed. There was a slight hiccup when RunKeeper lost GPS coverage under Blackfriars tunnel and in an attempt to compensate found the nearest location to be the other side of the Thames.

This caused a problem by adding extra distance to her total, so some of the latter tweets (“one mile to go”, for example) posted a little prematurely.

It was more of a problem for Gemma when running, the app announcing in her ear that she’d run further than she had, disheartened to see mile markers on the course thought to have already passed.

This is also why the total distance on the site clocks up to 27.65 miles, the race was long enough as it is!

The final touch to was to drop them on the map, alongside the Instagram images.

]]>http://blog.marchibbins.com/2011/05/18/as-long-as-you-follow/feed/0Moving Pictureshttp://blog.marchibbins.com/2011/05/16/moving-pictures/
http://blog.marchibbins.com/2011/05/16/moving-pictures/#commentsMon, 16 May 2011 22:22:08 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1679Beats Per Mile uses the Instagram API to find pictures taken around the marathon course.

We decided we’d need to find a way to put pictures on the map quite early on, knowing Gemma couldn’t be the one to stop and take them. Rather than try to pin a camera to her vest or strap one to a hat, the simplest solution was to find photos taken by spectators.

The Instagram API is fairly new and the app itself is getting extremely popular. Being mobile-based, we hoped it would be popular among spectators on the day, taking quick snaps and hopefully uploading a good amount of photos to dig around in.

The pictures are geo-tagged as well as captioned, so we could perform location queries and text-based searches (ultimately, a combination).

Firstly we agreed on places around which we’d search for pictures — busy spectator spots and London’s landmarks.

The idea was to look for pictures at these places as Gemma passed them. So we translated them into distances, i.e. determine the elapsed distance that would have been run when reaching each of these places.

Tower Bridge, for example, is at 12.5 miles. Big Ben is at 25 miles and so on — for about 10-15 hotspots.

The application monitored the total distance covered and at each of these key numbers hit the Instagram API for the most recent pictures around the location.

Setting up an Instagram application is instantaneous, though I waited a long time for my API key initially — I did apply when the announcement was first made however, so the turnaround may be a lot faster now.

There’s no moderation or application approval process, when you’re up and running you can start performing queries immediately.

The API is RESTful over HTTPS with a number of endpoints to query images, comments, users, locations, tags and so on. The developer docs are fairly comprehensive.

We’re interested in the media endpoint. Note that the following URLs require an access token or client id, which you will be given, omitted here for brevity.

Get the current most popular photos:

https://api.instagram.com/v1/media/popular

Or to get information about a single image with a media id:

https://api.instagram.com/v1/media/72612696

The search method was our main tool. It takes up to five parameters, lat, lng, max_timestamp, min_timestamp and distance.

Note only latitude and longitude parameters are required and distance is in meters, default at 1km with a maximum of 5km.

So at each of our key distances, the app took the latest latitude and longitude positions and grabbed the latest photos.

In an attempt to only select images of the marathon, this was backed up by inspecting within the result set for images with a caption containing any one of a set of predefined keywords such as ‘marathon’ or ‘runners’. That way we could almost ensure that we wouldn’t pick up any images not concerned with the race — though would find false positives.

Once we had the JSON data it was simply visualised on the map, Instagram host the images for us.

One thing lacking in the data perhaps, is that the location information only offers the latitude and longitude coordinates, no place or address names unless otherwise nominated by the user. For each image then I ran the data through Google’s Geocoding service to get a street or area name, just for display purposes.

On the whole, it worked well. The API is as straightforward as any and realistically, the biggest worry for us is that most people seem to use Instagram to take pictures of food and little else, we thought we’d have pictures of everyone’s breakfast around various parts of London all day.

Doubtingly, I wrote a simple ‘refresh’ button to rerun any query in case anything untoward or particularly boring popped up when I logged in to check, but between the huge crowds and the caption matching, I only had used it twice and both very early on in the day.

Here’s a handful of the pictures:

]]>http://blog.marchibbins.com/2011/05/16/moving-pictures/feed/1The Show Must Go Onhttp://blog.marchibbins.com/2011/04/27/the-show-must-go-on/
http://blog.marchibbins.com/2011/04/27/the-show-must-go-on/#commentsWed, 27 Apr 2011 20:06:55 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1654One of the key features of Beats Per Mile was the ability to listen to a ‘stream’ of Gemma’s iPod playlist, enabling you to hear exactly what she was listening to whenever you logged on.

We didn’t actually have a stream broadcasting from her iPod of course, rather a stream playing from the site that was synchronised with her start time.

We planned on using the SoundCloud API to do this and it was one of the last thing left to build before race day.

Part of the playlist was curated by friends, donated tracks with sentimental value or just old favourites for a personal touch and to provide an extra kick of motivation.

Asking a lot of people for contributions meant that the playlist wasn’t finalised and mixed until very late on — Saturday evening.

I created the player using the SoundCloud Player Widget in preparation for the tracks hoping that the player would be ready before they were uploaded.

It’s a Javascript-enhanced Flash Widget which uses Actionscript’s ExternalInterface to expose method handlers and control playback via a in-built API.

Unfortunately this meant it wouldn’t play on the iPhone. SoundCloud do offer a HTML5-based Custom Player (which falls back to Flash), but we didn’t have time to fully investigate wrangling together a player from scratch.

This would be shortly be rendered irrelevant when we discovered that it is now nearly impossible to upload any copyrighted songs, or tracks containing any samples of copyrighted songs, to SoundCloud. We also discovered that they’re very, very clever in how they go about detecting them.

Starting in the last few weeks we’ve turned on an automatic content identification system, similar to those used on other major media sharing sites. The system is used primarily for identifying audio that rightsholders have requested to be taken off SoundCloud. This is good news because it makes it easier for artists, labels and other content owners to control how the content they’ve created is available. And when you upload your own audio to SoundCloud, we can find out more quickly if somebody is uploading a copy to their own page without your permission.

SoundCloud have always has the right to remove audio deemed in violation of rights as stipulated in their terms of use. They also host plenty of mixes and DJ sets as many other similar sites do.

When we tried to upload ours though, nothing would work. None of our mixes were authorised and the refusals would come after spending considerable (precious) time attempting to upload them.

SoundCloud are essentially performing some kind of wave form analysis, comparing uploads to audio already in their databases to detect duplicates.

There are a few ropey ways to (possibly) slip the net, such as adding a layer of low-level noise to distort the wave form or apply an amount of time-stretching (which was happening anyway, as songs were mixed together).

Too much of either would ruin the music. We were running out of time and didn’t want to risk any hacked attempt being found later and removed, perhaps mid-marathon in a worst case scenario.

So I began to an attempt to recreate the widget, from scratch after all.

I looked at the Yahoo! Media Player, which is actually suspiciously similar to SoundCloud’s Widget API. The methods are almost exactly the same, but I had trouble handling multiple files — it really wasn’t anywhere near as easy to implement.

After browsing for alteratives I eventually found JPlayer, a very simple and easily customisable JQuery plug-in. This would also mean we’d be iPhone-compatible.

It also meant that the files would need to be hosted ourselves, on a normal server, rather than letting SoundCloud handle the load — another reason for our initial choice. We actually ended up serving 7.68GB of streamed audio, fortunately my host is very stable and it didn’t end up being a problem.

When visitors landed on the page the player would calculate how long the run had been in progress and therefore where you should be in the playlist. The idea was to give no controls, other than mute, the playback would always be synchronised.

Rather than upload all the songs individually, the playlist was divded into five 30-60ish minute tracks and would be easier to navigate.

Once the player determined what track you should be on and where within that the playhead should be, it begins to buffer. Annoyingly, you could have been waiting some time. If the tracks were on SoundCloud’s giant servers then the audio would be properly streamed, but not from mine.

The wait entirely depended on when you happened to arrive, for some it wasn’t a problem. Say you luckily arrived at the page needing only to jump in five minutes on the current track, you’d have a small waiting time. If the jump was twenty-five minutes, then start twiddling your thumbs.

There wasn’t a wait when the player switched between tracks. If a track was paused, the player would record how long for and resume at a later point — where the playhead would otherwise be if you hadn’t paused, not where you left off.

This also took into account track changes if you paused toward the end of a track or paused for more than the duration of an entire part.

SoundCloud have an obligation to artists and labels and choose to be very strict in authorising uploads that aren’t your own. I’ve wondered how copyright works for these sites, Mixcloud for example host countless mixed songs and sets.

Mixcloud in fact is where you’ll now find the mixes saved indefinitely, without complaint, on Gemma’s page.

By way of one of their various SDKs, the JustGiving API is a doddle to use and is pretty well featured. Unfortunately the documentation isn’t particularly comprehensive.

You can expect to be able to fetch any information found on a typical page — event and charity details, lists of donations and comments, even the colour scheme — all of which can be fetched without any need for authentication.

We’re using the PHP SDK to show a very simple totaliser, the current percentage raised of the target amount. This can be achieved in just a few lines of code.

Firstly you need to register your application, which will get you an API key and access to the staging sandbox. This access is only for development, you can query live pages but the information you’ll obtain is out-of-date.

Applications go through a two day approval process but require nothing more than an decent application description, from that point on you’re free to access real pages and live data.

As simple as querying the page, grabbing the current total and the target total then determining the percentage.

We retrieve the page via it’s short name, which for us is “Gemma-Bardsley”, this will be set in the JustGiving page admin area.

Note that the target amount has to be explicitly set there too (you’ll get a similar totaliser on your fundraising page anyway if that’s set correctly) and this queries the live API staging over HTTPS, not the developer sandbox.

The third parameter sent to the client constructor specifies version one of the API.

]]>http://blog.marchibbins.com/2011/04/26/if-you-got-the-money/feed/0Keep On Runninghttp://blog.marchibbins.com/2011/04/13/keep-on-running/
http://blog.marchibbins.com/2011/04/13/keep-on-running/#commentsWed, 13 Apr 2011 18:21:32 +0000Marc Hibbinshttp://blog.marchibbins.com/?p=1603Central to the Beats Per Mile application is the run data, the live information reporting Gemma’s current location, elapsed time, covered distance, average pace and speed.

Working out how to capture this information was our first challenge. Determining whether it was even possible was pivotal as to whether we could build the site.

Before deciding on the eventual solution of using RunKeeper, we looked for a few different routes we could take, of varying levels of complexity.

Firstly we considered building our own hardware, the basic idea being a GSM/GPS module for about £50 along with a SIM card with unlimited text messages. The module is then programmed to send it’s location to Twitter via SMS every minute or so.

There are a lot of posts around with extensive details on how to build this kind of thing yourself, here’s one doing exactly that. He’s also considering selling pre-built units, which would save us needing to get our soldering irons out.

With this option, whilst the geo-positioning data would be accurate, you can run pretty far in a minute so the route would be a little raw and wouldn’t accurately reflect the true path and wouldn’t look great on a map.

One step further would be to use something like Open GPS Tracker, which is similar but rather than using a combined GSM/GPS module just uses a GPS module plugged directly into a mobile phone.

This particular unit comes pre-built and with firmware, so perhaps might be limited in what we could capture, but otherwise ready to roll straight out of the box and only requires an old (pre-paid) handset. It’s also small in size, easy enough to strap onto a runner or put in a pocket. This comes in around £50ish for parts again, plus $35.91 for the software.

There are commercially manufactured units available too, trackers for mountain climbers, skiers, pilots, sailors and such that aren’t products of home brew electronics.

The SPOT Personal Tracker is one option, a satellite GPS tracker with built-in location services including sending your current location to Google Maps in near real-time, specifically for sharing with others at home. Very reliable and robust, tiny and light, the location services lend to exactly what we want to do but it’s very over-spec. It’s able to work beyond cellular coverage and above 15,000ft — not particularly necessary for us — and comes in at €149 for the unit and a service charge of €99 for a years use.

It does however come with a particularly impressive distress button, notifying the GEOS International Emergency Response Center in case of emergencies should one press it. Presumably this calls in an airlift or team of S.A.R. St. Bernards, which could come in handy when she hits the wall.

Satellite GPS is more immediately available though, in the form of an iPhone.

Initially we considered developing an app from scratch, but that’s not really anything any of us have done before. We could make an online application running on Safari with the Geolocation API, but this would mean having a page open for the entire run and hoping that it doesn’t time out or anything gets pressed on-screen as arms start swinging.

On the app store and found InstaMapper, a lo-fi and free app which sends location updates to the InstaMapper site which are available via a public API. It’s compatible on iPhone, Android and any Java-capable phone, it’s well supported and tested — a far more viable solution than anything we could knock up in the limited time we have. It also works with the phone locked, conserving battery life. The only cost is the data plan, which Gemma already has.

Then came the revelation of RunKeeper Elite, our eventual winner. At the time, Gemma had only just started using RunKeeper Pro — the Pro and Elite labels refer to your subscription status, Elite is a premium service for $4.99 — and we hadn’t really looked into what was available with the upgrade.

RunKeeper does all the things you’d expect from a standard GPS-enabled running app (or Garmin device) — track time, distance, pace, calories burned etc, over a geolocated route visualised on a map, stored with all your previous runs.

The Elite service offers a few training programs, alerts and reports and as of March last year added a ‘Live’ service that pushes your running data to their website as you run, rather than only uploading a report once it’s complete (as the Pro version does). It’s exactly the same data, but made publicly available in real time.

Googling around there are a few sites that have worked out ways to grab the data. For example, Firehole has developed an interface to capture and save activity, there’s a ton of info in the right-hand column of his page. There are plenty of ways to hack and scrape, but it’s a bit dirty and not particularly relevant to us, but shows the data retrieval is possible nonetheless.

These seem to concern the syndication of complete activities, publishing a list of recent runs as one would their latest tweets, or calculating the total number of miles run per month, say. But we’re only interested in the data for one run, a single activity.

Rather than regularly monitoring a profile for updates, we only have to load the data that populates a single RunKeeper page.

This is handy, because rather than scraping and traversing a profile to find what we’re looking for (from rendered HTML) we can load the single data file directly, a JSON file hosted on RunKeeper, then work with the data in exactly the same way as they do.

Now, this is all a bit naughty. According the the RunKeeper Terms of Service, we are strictly not allowed to load the data from their site in this way, even with the permission of the account holder. Any form of scraping at all, is prohibited.

This is a big problem. So what do we do?

Well Len Hardy of Firehole openly links to his API interface in the forum thread mentioned before, pointing out that you can scrape the page data, that he does so and that he’ll freely share the code. RunKeeper haven’t responded to his comment specifically or reminded him of their TOS, but have commented later on the same thread.

I think it’s highly unlikely that RunKeeper could be unaware of these goings on, these scrapers are easily found if you search for them and have been around for a while without being shut down.

I’m rather hoping that they actually don’t mind too much, right now. They must be aware of the demand, they have an API in development. Perhaps when that’s published they’ll chase up these sites and direct them that way. I’m sure (read, hope) that the developers would happily migrate to doing things the correct way.

So what’s our solution? Well, I asked anyway, but no response so far. We will go ahead and load the data, but in my opinion do so more conscientiously than some of the other attempts. For one, we’re not loading and filtering complete HTML pages as they are, that’s what screen scraping is, attempting to extract data from a rendered output. We’ll be loading the data directly, which is publicly accessible and hopefully with less overhead being a straight-up JSON file.

We’re also not loading anything regularly or for a long duration of time. We’re only concerned with one activity, the site will only load the data once the race has begun and when it’s finished we’ll take a copy of the file and serve it from our own servers as a static file.

We also intend on intelligently caching the data on our servers whilst the marathon is in progress. This way we can limit the amount of requests to RunKeeper and take on some of the load ourselves. It’s easily done and will actually improve the performance of the application anyway.

Really, I would like RunKeeper to see what we’re doing and think it’s cool. They have an awesome product and platform, people want to play with it more.

Once the race starts, all the data we need will be continually pushed from Gemma’s iPhone to the RunKeeper site.

Beats Per Mile will handle page requests and retrieve this data from RunKeeper in the form of a JSON file, which the server will then cache. We can cache the data knowing that the route already run will not change, we’re only interested in the new stuff.

RunKeeper updates their page every 25 seconds, so our cache will be about this length too. When a request is made, if the cache is still fresh we’ll serve that data, if not we’ll retrieve more from RunKeeper.

From then on, the page polls for the latest dataset and we serve the new information, appending data to the client-side stored JSON. The application works our what data each individual client has and serves the difference.

Once the run is complete, a flag will be raised by RunKeeper in the JSON file and we’ll update our page to reflect that. From this point on the application no longer needs to contact RunKeeper, so will serve static JSON from our own side.

It shapes up a little like this, where points 2 and 8 are conditional to the cache being in date:

As well as reducing the amount of calls to RunKeeper, caching the data means that visitors get the latest JSON nice and quickly.

I built a prototype in time for the Silverstone Half Marathon which Gemma ran last month which all went surprisingly smoothly. Here’s an image of the application as it was then:

The data is visualised on a Google map. The GPS coordinates are used to draw the red Polyline, the activity statistics are calculated by the iPhone app and simply refreshed here with every update.

Eventually our map will also have some more informative overlays, mile markers for example, and when and where the application sent tweets or found pictures taken nearby.

Initially we were going to show pace, speed and elevation graphs (you can see a very early attempt if you click through to the image link) but we’ve run out of time for those. Maybe version two.