To use open government data, first there has to be some data there. For reasons of scope and space, the dramatic opening up of government datasets over the last few years is likely to be little more than a few lines at the start of my dissertation before I turn to look at use of data, but how government data shifts from closed to open is vitally important, not only to understand how open government data initiatives may be replicated, but also to see how the processes of unlocking access to data could affect the use if is then put to.

A recent report by Becky Hogge, funded by the Open Society Institute’s Transparency and Accountability Initiative offers an insight into the process of unlocking UK government data through data.gov.uk, and US data at data.gov. Based on interviews with figures involved in, or closely observing, the opening of government data, the report suggests that a combination of Civil Society (in the form of civic hackers); Middle Management (in the civil service); and Top-Level support (through both Ministerial/top-level leadership, and the star-power of figures like Sir Tim Berners-Lee) helped to ensure the emergence of strong open government data environments in the UK and US.

The report goes on to explore how this three-tier model might apply in other contexts across the globe, particularly in opening up data in middle-income and developing countries, although the input from interviewees suggests that a simple application of the UK or US models is not likely to work out in a a straightforward way. There are strong echoes of Duncan Green‘s thesis on poverty reduction through Active Citizens & Effective States (active citizens able to use data to hold governments to account and effective states able to collect and manage data) to be found in quotes from international informants.

For a 10-day research project the report gives an impressive overview of two specific open government data processes, but I did find two issues worth raising in a spirit of constructive critique:

The analysis and comparison of Data.gov.uk and Data.gov platforms and datasets themselves on pages 4 – 6 misses the mark on a number of small details: but potentially significant details.

Counting datasets: The comparison between 3,241 datasets on data.gov.uk and 1,284 datasets on data.gov appears to assume that all data.gov.uk datasets are ‘raw data’. This is not the case: at least some links point to aggregate figures only available in PDFs or poorly structured spreadsheets. Counting datasets is, in any case, not an easy task. If you have one dataset which lists all the National Indicator scores for each Local Authority, this could be (and to an extent is on data.gov.uk) split into 200+ individual files, one for each indicator – yielding more datasets, but no more data (play with the ‘National Indicator’ facet on my Data.gov.uk Exhibit to explore this case in more detail). This highlights the need for something other than ‘the number of datasets’ as a measure of success or a metric to be celebrated.

Application hosting (minor point): The report mistakenly suggests data.gov.uk is hosting applications, rather than simply listing them. (I’m not sure this point is significant to any of the further analysis of the report, only pointing out as matter of clarification.)

Overstating causal claims: The report claims that, as a result of the release of Ordnance Survey data, “the UK is now witnessing a flourishing of postcode-related campaigning and political engagement sites”. It’s not clear that there is any evidence to back up this statement. Postcode relating campaigning has been long established – and campaigners have either bought datasets, or made do with low-granularity lookups from free services. It may be that campaigners now have less to pay, and so more resources for other activities, or that new postcode related activities do emerge – but these hypothesis would both need to be explored and evidenced. Whilst the ‘Further Research’ section of the report notes the need to look more at “the social impact of existing data catalogues like data.gov and data.gov.uk” (p. 42) it would be good to have social impact claims that are already being made supported by empirical or even anecdotal references. One of the things I fear about writing a dissertation on uses of open data right now is that the evidence I gather will necessarily refute some of the over-grand claims currently being made about the impacts of open government data – and this could be taken as fuel in arguments against open data.

The predominant focus on US and UK central government initiatives misses on gaining learning from a number of other approaches being adopted across the UK, EU and the rest of the world. For example:

The Open Three (Open Data, Open Standards, Open Source) approach in Vancouver where council resolutions as opposed to individual politicians executive leadership appear to have been significant.

The dataDotGov.ca initiative where a team are crowd-sourcing a directory of open data without government support.

The different approaches adopted in the UK at a local level by OpenlyLocal (crawling data, but encouraging use of certain standards to facilitate it), Litchfield (arguably driven forward by a single web developer), Kent (led from within the Innovation team, and adopting a non-techie focussed engagement strategy for re-use) and Warwickshire (with multiple web team members working on the project, and adopting developer competition approaches to engagement).

N.b. My bracketed analysis of the approaches above is based on secondary sources only – and is not checked for accuracy.

Whilst I’m not suggesting the report should have covered all these contexts (that would be a far bigger project), it would be valuable for anyone exploring different models of opening data to learn from a wider set of projects than the two central government examples given: examples that have been very much referring to each other in developing their own models.

Most of all though, I’d encourage anyone reading Becky’s report to carefull look at Dan McQuillan’s observations and to take seriously the critique offered. There is a risk that open data initiatives can verge on ‘emperors new clothes’ (p. 23) and it’s only by focussing explicitly on the use of data, and on models of social change, that we can be sure to avoid that.