Q&A: Tony Hirst, data blogger and lecturer, Open University

In the latest of our Q&As we hear from Tony Hirst, lecturer at the Open University. He blogs at OUseful.Info, where he explores the potential application of web technologies and applications to learning. He has been behind some of the most interesting uses of the Guardian’s Open Platform.

At news:rewired Tony will give examples of how journalists can make better use of data for telling stories and illustrating reports. You can follow Tony on Twitter: @psychemedia.

You can listen to a little preview of Tony’s session in this Boo – made while walking his dog:

So what exactly do you do at the Open University?
I’m a lecturer in the Department of Communication and Systems in the Faculty of Maths, Computing and Technology, which is to say that I help write distance education course materials for the OU. Recent courses I’ve worked on include include introductory courses on digital worlds (game design and appreciation), information skills and robotics. I also wrote a short block on visualisation for a new course due to start in February 2010. Aside from that, I track technologies that might be relevant to online education and try to get a feel for how we might make use of them, blogging quick demos and use-case examples at OUseful.info. I also chat with the library folk from time to time about how we might better integrate their services into our course materials.

Why did you decide to start experimenting with the Guardian’s Open Platform?
One of the ideas I’ve been exploring for some time is how we might design course materials that pull in live data feeds from trusted sources. OU course materials are written to last for several years, so it can be a challenge keeping them current: what interests me is the extent to which we might be able to populate charts, graphs and other visualisations that are embedded in a set of course materials when they are first produced, with data that is pulled from an up-to-date source throughout the life of the course.

The Guardian Data Store looked as if it might be a reasonable source of such data, at least for a trial that might last one or two years.

A couple of other factors motivated me in this direction too. Firstly, I’ve been looking for ways in which we might use the OU’s openly licenced OpenLearn materials to wrap news stories, providing readers with deeper background to a particular news story and helping them explore the issues in a more academic way. Secondly, the DataStore content is published via Google Spreadsheets, which has an API. At the time the DataStore was announced, I was looking for ‘authentic’ data published as a Google spreadsheet to play with, and ground my experiments with the API. So it was timely 🙂

What have you learnt about news collaborations online?
One of the things I’ve believed in for a long time is the potential for the OU to play a role a ‘public service educator’, cf. the public service broadcasting remit of the BBC. Ever since the OU was founded just over 40 years ago, we have had a relationship with the BBC and co-produced television and radio programmes with them (for example, one of the ‘other’ things I do at the OU is to work with the Open Broadcasting Unit as an academic liaison on co-produced episodes of Radio 4’s Click On (Series 1 and 2), the BBC World Service’s Digital Planet, and BBC2’s James May’s Big Ideas, in part helping develop some of the online materials that supported those OU/BBC broadcasts).

As part of this, I firmly believe that there is a role for the education sector in general, and the OU in particular, in providing an academic or educational take on news events. In the same way that many organisation make use of analysis sections, or background features to explore a particular topic, so too can educational organisations provide a place for the interested to learn more about a topic in a more academic (i.e. non-partisan) way.

Everyone knows that the media landscape is changing, that the go-to places for content that folk are consuming is changing (I’m thinking of intermediary publishers such as YouTube, or ‘new media channels’ that people tune in to such as their Facebook page). Whilst I know that we in the OU are privileged to have such a great relationship with the BBC, I also wonder whether we can’t start working with other content publishers too, either formally or informally, to enrich their content by linking it to ours, and to enrich our content by linking it with theirs.

At the time when the MPs’ expenses story began to break, I was looking for a reason to learn how to get different markers plotted on a map. The MPs’ travel expenses provided an authentic reason for doing that, and also meant that my ‘how to’ blog post was likely to get some traffic from MPs’ expenses related searches… That the Guardian picked it up too was a bonus! Part of the lesson there was that in order to get the attention of new organisations, you have to have some to offer in a timely fashion.

As well as online contact, I think snatched face-to-face conversations can also have a huge payback. For example, I like to think that the Guardian ‘Rosetta Stone’ spreadsheet owes something of its origins to an ad hoc chat I had with the Guardian Datastore’s Simon Rogers when I bumped to him whilst at an event at Kings Place and expressed concern at not being able to link data from two different datastore spreadsheets together.

What’s the most exciting bit of data mashing you’ve seen done by an online news publication?
The realtime Twitter snow map always amuses me, but I guess that’s not really a news publication? The New York Times do some wonderful interactive pieces, as do the BBC, both news related (e.g. the 2005 Election maps) and not (e.g. BBC Music). Ther are plenty of data related apps outside the news of course – for example, the recently launched Where Does My Money Go [Disclaimer: I was one of several people who provided feedback on earlier versions of this application ]

What I will say is that I think we need to distinguish between the use of data as a tool for testing hypotheses/lead generation (e.g. MPs’ expenses map), it’s use for providing a live view over a breaking or rolling news story (election map, Twitter snow map), and its use as the basis for an interactive to illustrate a story. In the latter case, I think a good test of whether an interactive will be used will be whether it allows the reader/user to explore an issue raised in a news story in a way that is relevant or local to them. So for example, if there is a news story about the correlation between this sort of crime and that sort of demographic in a particular area, an interactive map that allows the reader to check out the correlation in their neighbourhood, their parents’ neighbourhood, their friends neighbourhood etc would provide an opportunity to engage the reader in an exploration of the story in a way that is meaningful to them. I read somewhere once that journalism is used to craft stories where the local can illustrate the general (so a tale of woe or happiness about this cancer patient and the cost of that particular treatment resonates with everyone who has witnessed a related medical situation). A well crafted data driven interactive might be able to build on this and allow the general reader to explore the story in their particular context, maybe generating an anomalous result and a letter to the editor along the way…

Do you think journalists miss a lot of stories because they don’t pay attention to data sets?
I couldn’t possibly say… 😉

What advice would you give to journalists wishing to do more with data?
Don’t be afraid to start exploring, but take care. If you can get the data into a tool that lets you explore it visually, do so, and play around with the visualisations to let you test out various hypotheses (such as trends, or clusters in the data), or look for anomalies in the data (outliers, things that stand out). If something strikes you as odd, dig a little deeper into the data and check that it’s correct. Then you’ve got a lead for a story (but not necessarily its confirmation…)

With the opening up of public data, e.g. through the data.gov.uk initiative in th UK, there are going to be more and more opportunities for data led stories. The first phase of the data.gov.uk initiative is to support discovery of data sets, both excel spreadsheets and the more esoteric (and more powerful) Linked Data. But that’s only the start of making this stuff useful and useable. The next step is understanding what the datasets contain in general terms, and then considering what sort of questions you might be able to ask of them. Then you can start thinking up questions to ask of the data. For the moment at least, if you want to interrogate the LInked Data datastores, you might need some technical help in actually writing what is essentially a database query onto a data set; but the step you CAN take is working out what you want the query to ask.

As an example, I’ve recently just posted a demo that combines queries onto two data.gov.uk Linked Data data sets: the first one is to the education datastore and it asks for the locations of th two most recently opened schools. The second query takes the geographical co-ordinates (latitude and longitude) of the schools, and finds the traffic monitoring points within a couple of miles of each of those schools from the transport dataset. The next step, which is still on my to do list, is to plot traffic count charts for those monitoring points.

A couple of things to notice here: firstly, that you can start to combine data from different sources in order to look for possible stories; secondly, it will become possible to monitor databases with which you are familiar and essentially subscribe to alerts from particular queries to a database that will pull out results from recently added records.