You are browsing the archive for 2011 December.

Ramine Tinati at the University of Southhampton did a long interview with me last month for a study which looks at open data, amongst other things.

He just sent me the (long, long) transcript. Here it is.

R: Can you tell me your involvement with open government data, your role and how you got involved with it?

J: My personal interest in open data arose as a result of wanting to visually represent diverse sources of information on complex topics to make them easier for the public to understand. I later discovered the work of Otto Neurath, a Viennese intellectual who ran something called the Isotype Institute in the 1930s which is dedicated to visually representing public information. He was quite ahead of his time and was interested developing a universal picture language. He used this to represent things like labour statistics, demography, industrial output, and even to compare the political systems, the intricacies of how democracies function in different countries. For instance he did this book on the US and the UK looking at different justice systems. His main idea was to make it easier for the public to understand society, institutions and politics using information and graphics.

I wanted to build on that work using interactive technologies and computers to understand complex processes. I became acquainted with Rufus Pollock, who is one of the co-founders of the Open Knowledge Foundation, at an event in Cambridge. I wanted to start a separate organisation or institute (along the lines of the Isotype Institute) dedicated to visually representing public information, but he suggested that I join forces with the Open Knowledge Foundation.

My role at the Open Knowledge Foundation is Community Coordinator. This means that I am proactively engaging stakeholders, who may not necessarily have a prior interest in open data, allowing them to understand opportunities around how they can use data to improve what they are doing in some way – be that better reportage, working alongside investigative journalists, such as traditional media (e.g. The Guardian, The New York Times or Die Zeit); or civic society organisations, advocacy organisations or community groups. And there is also working with web developers, designers such as David McCandless from Information is Beautiful. There is loads of interesting work going on.

A big part of my role is to work with those stakeholders interested in open data and alongside public bodies who publish government data. You could broadly break this down into working with the data supply side on the one hand (e.g. public sector bodies looking to publish data), and the data demand side on the other (e.g. developers, journalists, NGOs and others who wish to use the data to do something useful or interesting).

R: So you are engaging in these new stakeholders, existing and emerging, how do you actually find these people, and bring them on board?

J: The question is: why should people or organisations bother to learn about open data? Ultimately our goal is to expand and strengthen the community of people who use and publish data to provide some form of value to society. We talk to people who are in possession of data or in the position to affect policy related to that data, so government people, like civil servants, politicians, legal and policy experts. We want to show them the value that can be generated if they open up public data.

For example we might work with a country that doesn’t have an open data initiative, and try to introduce them to people that have pioneered an open data initiative initiative in another country. When the French government decided that they were interested in doing an open data initiative, one of the first things we tried to do was put them in touch with people involved in setting up the UK’s open data initiative. Our community team at the Open Knowledge Foundation aspires to run an informal introduction service for people interested in open data in different countries.

It’s also about building trust and lasting relationships between key stakeholders. Community building is not just about pursuing some objective. In many ways it can be an end in itself. Having stronger connections between citizens, civic society organisations, the media, and public sector workers can be valuable in itself. For example journalists may feel more able to talk to people in the public sector, asking “do you have this” or “what does this data mean”, which means the quality of the reportage is going to increase and the interpretation of the data will improve as well. This is a really important aspect of what we do.

R: Have you seen any difficulties for non-public sector workers trying to engage with public sector workers?

J: Yes. I think many journalists and NGOs still feel apprehensive or don’t really know how to approach people who work in the public sector to ask for data. One of the things which is really exciting in the open data community is the broadening of this discourse.

Previously this area was predominantly about government transparency, and the discourse in this area implied certain expectations and assumptions, for example about how the public sector is backwards, inefficient or even corrupt. The whole ‘Sunlight is the best disinfectant’ mantra of the US transparency movement implies that the public sector needs to be cleaned up, and transparency helps with this (but of course even where there are problems – transparency, knowledge about what is happening, is only the first step to changing or fixing things, and by no means enough in itself!). When the current UK government first came to power they played up this discourse – and were more heavy on traditional transparency, wanting to increase efficiency, to reduce waste, and so on. It fit very well with their broader objectives: having a smaller state and all that.

But what I was saying was that it is exciting to see this traditional transparency discourse augmented with more evidence, arguments and anecdotes about engagement, participation, citizens getting involved in creating new kinds of digital services (such as mySociety and the civic hacker community) and so on. This shift in discourse comes with a shift in attitude. People are less hostile towards the public sector, and public sector is less hostile towards people who want to use their data. Citizens don’t just assume that all public bodies are corrupt and inefficient, and public bodies don’t assume that all citizens and data users are out to make them look bad.

R: So since the change in government, there has been…

J: A shift in discourse, absolutely. The Labour government placed more of an emphasis on public service reform and augmenting official digital services with innovation from outside government. When the Conservatives started talking about open data, placed a greater emphasis on increasing efficiency and reducing waste. While the discourse has shifted between governments, the initiative has remained very similar in its aims and approach. There has been significant commitment from both the current and the previous government which is good. There is cross-party consensus that open data is a good thing.

R: Has the community reflected that as well? Not only by broadening, but also their agenda – perhaps towards a financial goal?

J: The economic potential of information has always been a major part of policy in this area. Just to give some background, open government data is certainly not a new thing. I think in principle, the US government has been practicing open government data effectively since the 1976 Copyright Act, which meant that Federal government information was exempt from copyright. This obviously wasn’t called open government data back then, but it was open in the sense that it was both legally and technically freely available for people to reuse. This has been enshrined in policy, law and practice in the US for decades. It has been something that the Obama administration has been able to build on. They have had an important precedent.

Similarly in the UK, we’ve had quite a centralised system dealing with copyright and policy related to the reuse of public sector information (PSI). We’ve had the Office of Public Sector Information (which has since been amalgamated into the National Archives), which helps provide rules and guidance for how public bodies should legally and technically publish official documents and datasets.

This is very different to countries like Germany or Italy, where you have a much more decentralised governance (Germany’s Länder and Italy’s regioni), and there is no single centralised body with a remit to create overarching policies that eveyone has to adhere to. Hence you have more people, things take longer, and you really need a bottom up approach – with policies and best practises bubbling up from cities and states who are ahead of the curve.

In the history of policy in this area, money and markets have always been a big motivating factor. The economic value of public sector information. If you you look at the background to the European Public Sector Information Directive, there was a lot of interest in economic activity around public data in US, particularly around geospatial data products and services. This remains the case today, and if you look at speeches and official communications from the US, the EU, or from numerous other countries, they often refer to the economy and potential job creation, as well as transparency, accountability, new digital services for citizens and so on. The discourse in the open data community is similarly diverse – and advocates often draw on a broad base of evidence.

R: So are you suggesting that the early drivers for data.gov.uk came from previous work within the public sector information policies back in the early 1970s and 80s?

J: The best way to understand the success of open government data movement is to consider the number of different factors that has led to having the shape and momentum that it currently has. And also recognise the accident and contingency of many of these factors. It didn’t have to turn out the way things are, but it just has. These policies have played an important role in enabling politicians and decision makers to act quickly and effectively, and have provided them with a strong basis for them opening up more data.

While these previous policies have played an important role (and you can’t understand current activity in this area without understanding a bit about them) I think the real catalyst has been developers taking data, and produce useful applications for the public. That really turned the heads of the public, politicians, and the media, and that was the point at which a lot more people said “this is a really good thing”. Information reuse policies, which were previously the preserve of a handful of experts, researchers and consultants, suddenly started to get more mainstream.

R: So are there any key events from your experiences that you feel were particularly important or acted as milestones in open government data’s development?

J: As I’ve said, I think open data has a long historical shadow. Way before the big portals were released and reported on, you can see various bits and pieces which have really influenced official policy. For example, almost exactly a year before Obama issued the Open Government Directive, a group of open government activists including Carl Mahamud, Tim O’Reilly, Tom Steinberg, folks from the Sunlight Foundation and many others, wrote a set of principles which is echoed in official policies around the world.

Similarly before the launch of data.gov.uk, the Open Knowledge Foundation ran a workshop with civil servants, developers, journalists, researchers and other people, and we spoke about the value of having a central point of reference for open data which was available across the UK. We mapped common datasets that people wanted and created records for them in CKAN, which in fact is the software which runs data.gov.uk now. This was before data.gov or any other official open data portal existed, but many workshop participants wanted to see an official open data portal for the UK.

I think there have been lots of little things which have added up to shape the current state of official open data initiatives – not just single key events. And lots of these little things have been brought about by the relentless hard work of different individuals and bodies both inside and outside government. NGOs like mySociety in the UK, the Sunlight Foundation in the US, and the Open Knowledge Foundation in the UK and various other countries around the world have all played a big role. Specific projects such as mySociety’s TheyWorkForYou.com and various projects from the Sunlight Foundation have helped to make a policy area which used to be intangible and difficult to understand very tangible, and made the benefits of more open data policies more obvious to the public. There were also several important economic studies on the potential value of public sector information to society. And of course things like Sir Tim Berners-Lee taking up Rufus Pollock’s call for “Raw Data Now” in his TED talk and then him having breakfast with Gordon Brown. All of these things have been important, but there isn’t a recipe and its hard to abstract from this what the most important factors were. One has to talk to lots of people, consider things from lots of different angles, pay close attention to developments, and make the most of opportunities as they arise.

R: You talk about the ties between the UK and the US, do you think there was a lot of influence from the US on the UK during the formation and even to present regarding the open government data community?

J: There was a two way influence. Its not so well known, but one of the first open government data competitions (called “Show Us A Better Way”) was in fact run by the UK government. Then Vivek Kundra did his Apps for Democracy competition, which became much better known internationally. I remember several UK public servants jokingly grumbling about how the US had copied them.

Also a few months before our first Open Government Data Camp in London, I had a phonecall with Vivek Kundra about him being a keynote speaker and possible US participation in the event. He said he’d be there modulo a possible trip to India with the president. Then there was radio silence. A few weeks later we discover they had organised another International Open Government Data Conference the day before our event in Washington, and invited many of our speakers to their event.

R: So it’s a bit of a competition?

J: Yes, but healthy competition, which I think has helped things move forward more quickly in both countries.

R: Do you feel that there are any other key influences, specifically within the UK open government data community, that has helped it develop?

J: As I was alluding to earlier, discourse around open data is a really rich mix of evidence and arguments. The idea of open data itself is very simple. Even if you don’t know what it means, most people understand roughly what is about, or at least feel as if they do. Open government data sounds better than closed open data. Things which are ‘open’ or ‘free’ generally sound better than things which are locked up or cost tens of thousands of pounds. So even if you don’t really understand the complex legal, social and technical aspects of reuse its sounds like a nice idea.

And as you start to move out from the basic idea to unpacking its value – whether this is greater accountability of job creation or public service reform – there is almost something for everyone. Opening up public data can enable all of these different wonderful things to happen. The discourse around open data often does not form one coherent argument. There are lots of things that appeal to different people for different reasons. I think this richness has given open data broad appeal across the political spectrum, and the rich mix of good things that it makes possible has made it a very appealing for politicians and public bodies to talk about.

R: What kind of barriers have you come across delivering open government data?

J: There are several different clusters of barriers. The first cluster of barriers are related to the law. I guess the most obvious thing is that by default datasets and databases are covered by a complex set of rights – copyright, related rights, the European database directive and so on. The default position of the law is that you cannot ruse a given document or dataset in any way you want. You have to get permission. And having to ask permission can have chilling effect on reuse. It’s isn’t clear what you can and can’t do.

And then you have contracts, click wrap agreements, terms and conditions on websites and so on. You might have conflicting messages – such as a website saying that things can be reused, but then somewhere else it might say ‘all rights reserved’ and ‘you must ask permission’. There are also straightforward legal restrictions, licences that don’t permit certain types of reuse – or don’t permit reuse apart from for personal research purposes (which basically means you can download it, but not share it with anyone). That’s some of the legal stuff.

Then there’s technical stuff. For example when we first asked for the COINS data (which is the UK’s most detailed database of government spending) – we couldn’t get a copy of the data itself so we asked for the schema, which would tell us what the database contained and how it was structured. We finally received a huge slab of paper in the post, which is obviously not very useful when it comes to exploring the data on a computer.

R: So going back to the legal issues, what is needed to overcome these barriers? Is it a legal framework? Do the people involved need educating?

J: I think one has to be pragmatic. People often want to start with a general legal framework about how to systematically approach how public data can be used. While that is highly desirable to have in the long term, I don’t think its the best way to start out. Because in order to adopt a general legal or licensing framework you need to have consensus from all relevant stakeholders, which can be very time consuming and may bring things to a halt. Public servants obviously want to be cautious and to make the right decision, and people will often argue that they should be considered a special case, an exception to the rule. It takes time to get consensus.

The best case scenario for the end user of public data is probably something like the French model, which is effectively a decree right from the top saying that by default all public information has to be open, and exceptions require authorisation from an executive body on a case by case basis. The French initiative is just getting started and I look forward to seeing how this works out. From a legal point of view it’s a highly desirable model. The US has something like a bit like this, but the reality is that it isn’t enough just to say ‘you have to do this’. You may also need to have people chasing this up, making lots of phone calls, and making sure you have incentives, and disincentives, the carrots and sticks, to make sure that public bodies actually implement the policy. Like so many things, this needs hard work for it to happen.

R: So it’s the social communication that helps to overcome the barriers.

J: I think a lot of this is about communicating things, and also about building trust. I suspect that in the longer term one of the biggest achievements of the open data movement will be in starting conversations between different stakeholders around public information, and effecting cultural change within the public sector. This isn’t necessarily hard legislation, more like soft policy: providing incentives, showing public servants the benefits that opening up data can bring about, and with social carrots and sticks rather than hard and fast law. This has worked really well for data.gov.uk. Rather than saying you have to open up all of your data, they said “we’re creating this amazing project called data.gov.uk, and we’d be delighted if you would contribute data, and if you want to contribute data, you must do under these terms”.

R: So giving some kind of goal, but also stipulating some guidelines for them?

J: Yes. But I understand that it is also tough. The UK government has sometimes found it very difficult to get the very granular spending data that they wanted from local authorities, some of whom just say no, and some say they are under resourced.

R: Do you think open government data is something which will last, more than a passing phase?

J: I think open data has got real momentum, especially internationally. I think it’s clear that the open model of public sector information and the provision of data is highly desirable in most circumstance, for most public sector bodies. Adoption is just going to take time and hard work. But I’m quite confident that that will become the default. But its not just about the public sector, its also about society. There are cases where public bodies have opened up data, but have found that civic society isn’t necessarily using it as much as they’d hoped. For example in the Moldova, they have a data portal with datasets on it, and they are currently looking at how they can increase uptake and reuse.

R: So it’s educating the whole of society rather than just the few that are involved with the data?

J: Yes. At the Open Knowledge Foundation we’re really keen to try to encourage more people to learn how to understand how to use data. But increasing data literacy is not something for which the open data movement is single-handedly responsible, just as public library advocates in the 19th century were not single-handedly responsible for increasing literacy in society. There are lots of different factors involved: the education system, the media, professional training, intuitive tools, and so on. It will also take a while!

R: So what do you see as the next steps for the community to develop? In specific the UK community?

J: We need lot more proactive engagement with stakeholders who might not realise that open data might help them do something, or understand something. This includes journalists, students, researchers, NGOs, community groups, companies and many others. We need more people to realise how open data can provide value to them, and we need more people to start using data to provide value to society – whether through increasing understanding of complex issues, providing better services for citizens, or whatever else.

R: Engagement of citizens?

J: Citizens, civic society organisations, media outlets, companies. One big thing is to make people realise that they could use public data in their own (internal) information management systems. We need to encourage more people to learn how to use data, and how to request it. Going back to the idea of the ‘supply’ and ‘demand’ sides – we need a lot more practical, hands on engagement on the demand side. This is a real priority.

R: Are there any people in this community that you think may be interesting to talk to?

J: Stefano Bertolo at the European Commission would be able to speak about the importance of how the technology is maturing. He’s very interested in the potential of Linked Data. And Linked Data is one solution to a big challenge: we have all this data which is to some extent, raw material, and we need to do work to clean it up and combine it with other datasets. There are all sorts of debates around what technologies are best to do this. Not everyone agrees that Linked Data technologies are the only way or the best way to make progress on this front – but Stefano will be able to tell you more about what is happening in this area and what the implications are.

R: Just expanding on what you said about Linked Data and the community, what’s your take on the semantic web and linked data, how is it involved in the open data community?

J: I think they have evolved as separate communities. Obviously there is some strong overlap, but you can really see how distinct they are in the UK. For example you can go back and look at the UK government list and see that there have been heated debates about the value of Linked Data. From my understanding, conceptually the value of linked data is very clear: one wants to be able to link together lots of different datasets from different sources. But when it comes to how you do this, how this is implemented, people have strong opinions which will often diverge. I’m not an expert, and I don’t have strong views on whether triple store technologies inherently superior to MySQL or NoSQL. I think the rule for most end users is ‘whatever works’, and to some extent, healthy competition between technologies is good. Linked Data is a very important technological development. I think its value is more obvious when you start to combine data from lots of different sources, and want to be able to mix different types of data together. But the broader open data community is pretty agnostic about what tools and technologies are best. It overarching goal is to get more data out there in a form which legally and technologically enables reuse, and to encourage more people to use it in useful and interesting ways.

This week I am at Digital Curation Conference in Bristol on Monday / Tuesday, then Open Research Reports hackathon on Tuesday evening / Wednesday, then catching up with some other project people on Thursday, then on Friday will be back at Edinburgh dealing with some stuff at the University.

The main aim at the moment is to demonstrate uses of the bibserver code, and get it up to a versionable point by Christmas, after which we can start tackling using it as a service.

I have also been working with Naomi, who started on Thursday last week, and should catch up more with Etienne, who is doing programming with us too now.