if writing is a muscle, this is my gym

Tag Archives: opendata

So, with much help from various community members (who reminded me that we need to get this rolling – looking at you Heather Leson), I pleased to say we are starting to gear up for Open Data Day 2014 on February 22nd, 2014.

From its humble beginnings of a conversation between a few friends who were interested in promoting and playing with open data, last year Open Data Day had locally organized events take place in over 100 cities around the world. Check out this video of open data day in Kathmandu last year.

Why makes Open Data Day work? Mostly you. It is a global excuse for people in communities like yours to come together and organize an event that meets their needs. Whether that is a hackathon, a showcase and fair, lectures, workshops for local NGOs and businesses, training on data, or meetings with local politicians – people are free to organize around whatever they think their community needs. You can read more about how Open Data Day works on our website.

Want to join in on the fun? I thought you’d never ask. Listed below are some different ways you can help make Open Data Day 2014 a success in your community!

C) Forget about participating, I want to coordinate an Open Data Day event in my city.

Read the Open Data Day website. Basically, pick up on our vibe: we want Open Data Day to work for everyone, from novices who know little about data to experts like Kaggle participants and uber geeks like Bruce Schneier. These events have always been welcoming and encouraging – it is part of the design challenge.

Okay, now add your city to the list, let people know where it will be taking place (or that you are working on securing space), let them know a rough agenda, what to expect, and how they can contribute.

Add yourself to the 2014 Open Data Day map. (Hint: Wikipedia lists Lat/Long in the information side bar for each cities wiki page: “Coordinates: 43°42′N 79°24′W”)

Join the Open Data Day mailing list. Organizers tend to share best practices and tips here. It’s not serious, really just a help and support group.

Check out resources like this and this about how to organize a successful event.

Start spreading the news!

D) I want to help more! How can Open Data Day work more smoothly everywhere?

Okay, for the truly hardcore you right, we need help. Open Data day has grown. This means we’ve outgrown a whole bunch of our infrastructure… like our webpage! Everyone involved in this is a volunteer so… we have some extra heavy lifting we need help with. This includes:

a. Website template update: The current Open Data Day template was generously donated by Mark Dunkley (thank you!!!). We’d love to have it scale a little better and refresh the content. You can see the code on github here. Email me if you are interested. Skills required: css, design

b. Translation: Can you help translate the ODD site into your language? You can submit the requests on github or send a document to heather.leson at okfn dot org with the content. She’ll do the github stuff if that’s beyond you.

c. Map: Leaflet and layers helpers wanted! We’d like a map geek to help correct geolocation and keep the 2014 map fresh with accurate geo for all the locations. Github repo is here and the event list is here.

Over at the Programmable City website Rob Kitchin has a thoughtful blog post on open data critiques. It is very much worth reading and wider discussion. Specifically, there are two competing things worth noting. First, it is important for the open data community – and advocates in particular – to acknowledge the responsibility we have in debates about open data. Second, I’d like to examine some of the critiques raised and discuss those I think misfire and those that deserve deeper dives.

For years we have been on the outside, yelling that open data matters. But now we are being invited inside.

Two years later the transition is more than complete. If you have any doubts, consider this picture:Once you have these people talking about things like a G8 Open Data Charter you are no longer on the fringes. Not even remotely.

It also means understanding the challenges around open data has never been more important. We – open data advocates – are now complicit it what many of the above (mostly) men decide to do around open data. Hence the importance of Rob’s post. Previously those with power were dismissive of open data – you had to scream to get their attention. Today, those same actors want to act now and go far. Point them (or the institutions they represent) in the wrong direction and/or frame an issue incorrectly and you could have a serious problem on your hands. Consequently, the responsibility of advocates has never been greater. This is even more the case as open data has spread. Local variations matter. What works in Vancouver may not always be appropriate in Nairobi or London.

I shouldn’t have to say this but I will, because it matters so much: Read the critiques. They matter. They will make you better, smarter, and above all, more responsible.

The Four Critiques – a break down

Reading the critiques and agreeing with them is, of course, not the same thing. Rob cites four critiques of open data: funding and sustainability, politics of the benign and empowering the empowered, utility and usability, and neoliberalisation and marketisation of public services. Some of these I think miss the real concerns and risks around open data, others represent genuine concerns that everyone should have at the forefront of their thinking. Let me briefly touch on each one.

Funding and sustainability

This one strikes me as the least effective criticism. Outside the World Bank I’ve not heard of many examples where government effectively sell their data to make money. I would be very interested in examples to the contrary – it would make for a great list and would enlighten the discussion – although not, I suspect in ways that would make either side of the discussion happy.

The little research that has been done into this subject has suggested that charging for government data almost never yields much money, and often actually serves as a loss creating mechanism. Indeed a 2001 KPMG study of Canadian geospatial data found government almost never made money from data sales if purchases by other levels of government were not included. Again in Canada, Statistics Canada argued for years that it couldn’t “afford” to make its data open (free) as it needed the revenue. However, it turned out that the annual sum generated by these sales was around $2M dollars. This is hardly a major contributor to its bottom line. And of course, this does not count the money that had to go towards salaries and systems for tracking buyers and users, chasing down invoices, etc…

The disappointing line in the critique however was this:

de Vries et al. (2011) reported that the average apps developer made only $3,000 per year from apps sales, with 80 percent of paid Android apps being downloaded fewer than 100 times. In addition, they noted that even successful apps, such as MyCityWay which had been downloaded 40 million times, were not yet generating profits.

Ugh. First, apps are not what is going to make open data interesting or sexy. I suspect they will make up maybe 5% of the ecosystem. The real value is going to be in analysis and enhancing other services. It may also be in the costs it eliminates (and thus capital and time it frees up, not in the companies it creates), something I outlined in Don’t Measure the Growth, Measure the Destruction.

Moreover, this is the internet. The average doesn’t mean anything. The average webpage probably gets 2 page views per day. That hardly means there aren’t lots of very successful webpages. The distribution is not a bell curve, its a long tail, so it is hard to see what the average tells us other than the cost of experimentation is very, very low. It tells us very little about if there are, or will be successful uses of open data.

Politics of the benign and empowering the empowered

The is the most important critique and it needs to be engaged. There are definitely cases where data can serve to further marginalize at risk communities. In addition, there are data sets that for reasons of security and privacy, should not be made open. I’m not interested in publishing the locations of women’s shelters or worse, the list of families taking refuge in them. Nor do I believe that open data will always serve to challenge the status quo or create greater equality. Even at its most reductionist – if one believes that information is power, then greater ability to access and make us of information makes one more powerful – this means that winners and losers will be created by the creation of new information.

There are however, two things that give me some hope in this space. The first is that, when it comes to open data, the axis of competition among providers usually centers around accessibility. For example, the Socrata platform (an provider of open data portals to government) invests heavily in creating tools that make government data accessible and usable to the broadest possible audience. This is not a claim that all communities are being engaged (far from it) and that a great deal more work cannot be done, but there is a desire to show greater use which drives some data providers to try to find ways to engage new communities.

The second is that if we want to create data literate society – and I think we do, for reasons of good citizenship, social justice and economic competitiveness – you need the data first for people to learn and play with. One of my most popular blog posts is Learning from Libraries: The Literacy Challenge of Open Data in which I point out that one of the best ways to help people become data literate is to give them more interesting data to play with. My point is that we didn’t build libraries after everyone knew how to read, we built them beforehand with the goal of having them as a place that could facilitate learning and education. Of course libraries also often have strong teaching components to them, and we definitely need more of this. Figuring out who to engage, and how it can be done most effectively is something I’m deeply interested in.

There are also things that often depress me. I struggle to think of technologies that did not empower the empowered – at least initially. From the cell phone to the car to the printing press to open source software, all these inventions have had helped billions of people, but they did not distribute themselves evenly, especially at first. So the question cannot be reduced to – will open data empower the empowered, but to what degree, and where and with whom. I’ve seen plenty of evidence where data has enabled small groups of people to protect their communities or make more transparent the impact (or lack there of) of a government regulation. Open data expands the number of people who can use government information for their own ends – this, I believe is a good thing – but that does not mean we shouldn’t be constantly looking for ways to ensure that it does not reinforce structural inequity. Achieving perfect distribution of the benefits of a new technology, or even public policy, is almost impossible. So we cannot make perfect the enemy of the good. However, that does not hide the fact that there are real risk – and responsibilities as advocates – that need to be considered here. This is an issue that will need to be constantly engaged.

Utility and Usability

Some of the issues around usability I’ve addressed above in the accessibility piece – for some portals (that genuinely want users) the axis of evolution is pointed in the right direction with governments and companies (like Socrata) trying to embed more tools on the website to make the data more usable.

I also agree with the central concern (not a critique) of this section, which is that rather than creating a virtuous circle, poorly thought out and launched open data portals will create a negative “doomloops” in which poor quality data begets little interest which begets less data. However, the concern, in my mind, focuses on to narrow a problem.

One of the big reasons I’ve been an advocate of open data was a desire not just to help citizens, non-profits and companies gain access to information that could help them with their missions, but to change the way government deals with its data so that it can share it internally more effectively. I often cite a public servant I know who had a summer intern spend 3 weeks surfing the national statistical agency website to find data they knew existed but could not find because of terrible design and search. A poor open data site is not just a sign that the public can’t access or effectively use government data, it usually suggests that the governments employees can’t access or effectively use their own data. This is often deeply frustrating to many public servants.

Thus, the most important outcome created by the open data movement may have been making governments realize that data represents an asset class that of which they have had little understanding (outside, sadly, the intelligence sector, which has been all too aware of this) and little policy and governance (outside, say, the GIS space and some personal records categories). Getting governments to think about data as a platform (yes, I’m a fan of government as a platform for external use, but above all for internal use) is, in my mind, one way we can both enable public servants to get better access to information while simultaneously attacking the huge vendors (like SAP and Oracle) whose $100 million dollar implementations often silo off data, rarely produce the results promised and are so obnoxiously expensive it boggles the mind (Clay Johnson has some wonderful examples of the roughly 50% of large IT projects that fail).

They key to all this is that open data can’t be something you slap on top of a big IT stack. I try to explain this in It’s the Icing Not the Cake, another popular blog post about why Washington DC was able to effectively launch an open data program so quickly (which was, apparently, so effective at bringing transparency to procurement data the subsequent mayor rolled it back). The point is, that governments need to start thinking in terms of platforms if – over the long term – open data is going to work. And it needs to start thinking of itself as the primary consumer of the data that is being served on that platform. Steve Yegge’s brilliant and sharp witted rant on how Google doesn’t get platforms is an absolute must read in this regard for any government official – the good news is you are not alone in not finding this easy. Google struggles with it as well.

My main point. Let’s not play at the edges and merely define this challenge as one of usability. It is much, much bigger problem than that. It is a big, deep, culture-changing BHAG problem that needs tackling. If we get it wrong, then the big government vendors and he inertia of bureaucracy win. We get it right and we potentially could save taxpayers millions while enabling a more nimble, effective and responsive government.

Neoliberalisation and Marketisation of Government

If you not read Jo Bates article “Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative” I highly recommend it. There are a number of arguments in the article I’m not sure I agree with (and feel are softened by her conclusion – so do read it all first). For example, the notion that open data has been co-opted into an “ideologically framed mould that champions the superiority of markets over social provision” strikes me as lacking nuance. One of the things open data can do is create a public recognition of a publicly held data set and the need to protect these against being privatized. Of course, what I suspect is that both things could be true simultaneously – there can be increased recognition of the importance of a public asset while also recognizing the increased social goods and market potential in leveraging said asset.

However, there is one thing Bates is absolutely correct about. Open data does not come into an empty playing field. It will be used by actors – on both the left and right – to advance their cause. So I too am uncomfortable with those that believe open data is going to somehow depoliticize government or politics – indeed I made a similar argument in a piece in Slate on the politics of data. As I try to point out you can only create a perverse, gerrymandered electoral district that looks like this…

… if you’ve got pretty good demographic data about target communities you want to engage (or avoid). Data – and even open data – doesn’t magically make things better. There are instances where open data can, I believe, create positive outcomes by shifting incentives in appropriate ways… but similarly, it can help all sorts of actors find ways to satisfy their own goals, which may not be aligned with your – or even society at large’s – goals.

This makes voices like Bates deeply important since they will challenge those of us interested in open data to be constantly evaluating the language we use, the coalitions we form and the priorities that get made, in ways that I think are profoundly important. Indeed, if you get to the end of Bates article there are a list of recommendations that I don’t think anyone I work with around open data would find objectionable, quite the opposite, they would agree are completely critical.

Summary

I’m so grateful to Rob for posting this piece. It is has helped me put into words some thoughts I’ve had, both about the open data criticisms as well as the important role the critiques play. I try hard to be critical advocate of open data – one who engages the risks and challenges posed by open data. I’m not perfect, and balancing these two goals – advocacy with a critical view – is not easy, but I hope this shines some window into the ways I’m trying to balance it and possible helps others do more of it as well.

The task force will look at best practices around the world as well as engage a number of stakeholders and conduct a series of public consultations across Ontario to make a number of recommendations around opening up the Ontario government.

I have an opinion piece in the Toronto Star today titled The Promise and Challenges of Open Government where I try (in a few words) to outline some of the challenges the task force faces as well as some of the opportunities I hope it can capitalize on.

The promise and challenges of open government

Last week, Premier Kathleen Wynne announced the launch of Ontario’s Open Government initiative, including an engagement task force (upon which I sit).

The premier’s announcement comes on the heels of a number of “open government” initiatives launched in recent years. President Barack Obama’s first act in 2009 was to sign the Memorandum on Transparency and Open Government. Since then numerous city, state and provincial governments across North America are finding new ways to share information. Internationally, 60 countries belong to the Open Government Partnership, a coalition of states and non-profits that seeks to improve accountability, transparency, technology and innovation and citizen participation.

Some of this is, to be blunt, mere fad. But there is a real sense among many politicians and the public that governments need to find new ways to be more responsive to a growing and more diverse set of citizen needs, while improving accountability.

Technology has a certainly been – in part – a driver, if only because it shifts expectations. Today a Google search takes about 30 milliseconds, with many users searching for mere minutes before locating what they are looking for. In contrast, access to information requests can take weeks, or months to complete. In an age of computers, government processes often seem more informed by the photocopier – clinging to complex systems for sorting, copying and sharing information – than using computer systems that make it easy to share information by design.

There is also growing recognition that government data and information can empower people both inside and outside government. In British Columbia, the province’s open data portal is widely used by students – many of whom previously used U.S. data as it was the only free source. Now the province benefits from an emerging workforce that uses local data while studying everything from the environment to demography to education. Meanwhile the largest user of B.C.’s open data portal are public servants, who are able to research and create policy while drawing on better information, all without endless meetings to ask for permission to use other departments’ data. The savings from fewer meetings alone is likely significant.

The benefits of better leveraging government data can affect us all. Take the relatively mundane but important issue of transit. Every day hundreds of thousands of Ontarians check Google Maps or locally developed applications for transit information. The accumulated minutes not spent waiting for transit has likely saved citizens millions of hours. Few probably realize however that it is because local governments “opened” transit data that it has become so accessible on our computers and phones.

Finally, there are a number of new ways to think about how to “talk” to Ontarians. It is possible that traditional public consultations could be improved. But there is also an opportunity to think more broadly about how the government interacts with citizens. Projects like Wikipedia demonstrate how many small contributions can create powerful resources and public assets. Could such a model apply to government?

All of these opportunities are exciting – and the province is right to explore them. But important policy questions remain. For example: how do we safeguard the data government collects to minimize political interference? The country lost a critical resource when the federal government destroyed the reliability of the long form census by making it voluntary. If crowdsourcing and other new forms of public engagement can be adopted for government, how do we manage privacy concerns and preserve equality of opportunity? And how will such changes affect public representation? Canada’s political system has been marked by increasing centralization of power over the past several decades – will new technologies and approaches further this trend? Or could they be shaped to arrest it? These are not simple questions.

It is also easy to dismiss these efforts. This will neither be the first nor the last time people talk about open government. Indeed, there is a wonderfully cynical episode of Yes, Minister from 1980 titled “Open Government.” More recently, various revelations about surveillance and national governments’ desire to snoop in on our every email and phone call reveals much about what is both opaque and to be feared about our governments. Such cynicism is both healthy and necessary. It is also a reason why we should demand more.

Open government is not something we will ever fully achieve. But I do hope that it can serve as an objective and a constantly critical lens for thinking about what we should demand. I can’t speak for the other panelists of the task force, but that will be how I approach my work.

David Eaves is a public policy entrepreneur, open government activist and negotiation expert. He is a member of the Ontario government’s new Engagement Task Force.

My core argument was how decisions about what information gets made accessible is no longer best managed at the end of a policy development or program delivery process but rather should be embedded in it. This means monkeying around and ensuring there is capacity to export government information and data from the tools (e.g. software) government uses every day. Logically, this means monkeying around in procurement policy (see slide below) since that is where the specs for the tools public servants use get set. Trying to bake “access” into processes after the software has been chosen is, well, often an expensive nightmare.

Privately, one participant from a police force, came up to me afterward and said that I was simply guiding people to another problem – procurement. He is right. I am. Almost everyone I talk to in government feels like procurement is broken. I’ve said as much myself in the past. Clay Johnson is someone who has thought about this more than others, here he is below at the Code for America Summit with a great slide (and talk) about how the current government procurement regime rewards all the wrong behaviours and often, all the wrong players.

So yes, I’m pushing the RTI and open data community to think about procurement on purpose. Procurement is borked. Badly. Not just from a wasting tax dollars money perspective, or even just from a service delivery perspective, but also because it doesn’t serve the goals of transparency well. Quite the opposite. More importantly, it isn’t going to get fixed until more people start pointing out that it is broken and start contributing to solving this major bottle neck of a problem.

All of this becomes more important if the White House’s (and other governments’ at all levels) have any hope of executing on their digital strategies (image below). There is going to be a giant effort to digitize much of what governments do and a huge number of opportunities for finding efficiencies and improving services is going to come from this. However, if all of this depends on multi-million (or worse 10 or 100 million) dollar systems and websites we are, to put it frankly, screwed. The future of government isn’t to be (continue to be?) taken over by some massive SAP implementation that is so rigid and controlled it gives governments almost no opportunity to innovate. And this is the future our procurement policies steer us toward. A future with only a tiny handful of possible vendors, a high risk of project failure and highly rigid and frail systems that are expensive to adapt.

Worse there is no easy path here. I don’t see anyone doing procurement right. So we are going to have to dive into a thorny, tough problem. However, the more governments that try to tackle it in radical ways, the faster we can learn some new and interesting lessons.

On October 11th I was invited by Elizabeth Denham, the Access to Information and Privacy Commissioner for British Columbia to give a keynote at the Privacy and Access 20/20 Conference in Vancouver to an audience that included the various provincial and federal Information Commissioners.

Below is my keynote, I’ve tried to sync the slides up as well as possible. For those who want to skip to juicier parts:

7:08 – thoughts about the technology dependence of RTI legislation

12:16 – the problematic approach to RTI implementation that results from these unsaid assumptions

28:25 – the need and opportunity to bring open data and RTI advocates together

So first things first – the competition is now live. Indeed, there are already 19 teams and 56 submissions that have been made. Fortunately, time is on your side, there are 56 days to go.

As I mentioned in my previous post on the subject, I have real hopes that this competition can help test a hypothesis I have about the possibility of an algorithmic open commons:

There is, however, for me, a potentially bigger goal. To date, as far as I know, predictive algorithms of 311 data have only ever been attempted within a city, not across cities. At a minimum it has not been attempted in a way in which the results are public and become a public asset.

So while the specific problem this contest addresses is relatively humble, I’d see it as a creating a larger opportunity for academics, researchers, data scientists, and curious participants to figure out if can we develop predictive algorithms that work for multiple cities. Because if we can, then these algorithms could be a shared common asset. Each algorithm would become a tool for not just one housing non-profit, or city program but a tool for all sufficiently similar non-profits or city programs.

Of course I’m also discovering there are other benefits that arise out of these competitions.

This last weekend there was a mini-sub competition/hackathon involving a subset of the data. It was amazing to watch from afar. First, I was floored by how much cooperation there was, even between competitors and especially after the competition closed. Take a look at the forums, they are probably make one of the more compelling cases that open data can help foster more people to want to learn how to manipulate and engage with data. Here are contestants sharing their approaches and ideas with one another – just like you’d want them to. I’d known that Kaggle had a interesting community and that learning played an important role in it, but “riding along” in a mini competition has caused me to look again at the competitions through a purely educational lens. It is amazing how much people both wanted to learn and share.

As in the current competition, the team at the hackathon also ran a competition around visualizing the data. And there were some great visualization of the data that came out of it, as well as another example of where people were trying to learn and share. Here are two of my favourites:

I love this visualization by Christoph Molnar because it reveals the different in request locations in each city. In some they are really dense, whereas in others they are much (more) evenly distributed. Super interesting to me.

I also love the simplicity of this image created by miswift. There might have been other things I’d done, like colour coded similar problems to make them easier to compare across cities. But I still love it.

I’m pleased to share that, in conjunction with SeeClickFix and Kaggle I’ll be sponsoring a predictive data competition using 311 data from four different cities. My hope is that – if we can demonstrate that there are some predictive and socially valuable insights to be gained from this data – we might be able to persuade cities to try to work together to share data insights and help everyone become more efficient, address social inequities and address other city problems 311 data might enable us to explore.

Here’s the backstory and some details in anticipation of the formal launch:

The Story

Several months back Anthony Goldbloom, the founder and CEO of Kaggle – a predictive data competition firm – approached me asking if I could think of something interesting that could be done in the municipal space around open data. Anthony generously offered to waive all of Kaggle’s normal fees if I could come up with a compelling contest.

After playing around with some ideas I reached out to Ben Berkowitz, co-founder of SeeClickFix (one of the world’s largest implementers of the Open311 standard) and asked him if we could persuade some of the cities they work for to share their data for a competition.

Thanks to the hard work of Will Cukierski at Kaggle as well as the team at SeeClickFix we were ultimately able to generate a consistent data set with 300,000 lines of data involving 311 issues spanning 4 cities across the United States.

In addition, while we hoped many of who might choose to participate in a municipal open data challenge would do so out curiosity or desire to better understand how cities work, both myself and SeeClickFix agreed to collectively put up $5000 in prize money to help raise awareness about the competition and hopefully stoke some media (as well as broader participant) interest.

The Goal

The goal of the competition will be to predict the number of votes, comments and views an issue is likely to generate. To be clear, this is not a prediction that is going to radically alter how cities work, but it could be a genuinely useful to communications departments, helping them predict problems that are particularly thorny or worthy proactively communicating to residents about. In addition – and this remains unclear – my own hope is that it could help us understand discrepancies in how different socio-economic or other groups use online 311 and so enable city officials to more effectively respond to complaints from marginalized communities.

In addition there will be a smaller competition around visualization the data.

The Bigger Goal

There is, however, for me, a potentially bigger goal. To date, as far as I know, predictive algorithms of 311 data have only ever been attempted within a city, not across cities. At a minimum it has not been attempted in a way in which the results are public and become a public asset.

So while the specific problem this contest addresses is relatively humble, I’d see it as a creating a larger opportunity for academics, researchers, data scientists, and curious participants to figure out if can we develop predictive algorithms that work for multiple cities. Because if we can, then these algorithms could be a shared common asset. Each algorithm would become a tool for not just one housing non-profit, or city program but a tool for all sufficiently similar non-profits or city programs. This could be exceptionally promising – as well as potentially reveal new behavioral or incentive risks that would need to be thought about.

Of course, discovering that every city is unique and that work is not easily transferable, or that predictive models cluster by city size, or by weather, or by some other variable is also valuable, as this would help us understand what types of investments can be made in civic analytics and what the limits of a potential commons might be.

So be sure to keep an eye on the Kaggle page (I’ll link to it) as this contest will be launching soon.