Announcing the 311 Data Challenge, soon to be launched on Kaggle

I’m pleased to share that, in conjunction with SeeClickFix and Kaggle I’ll be sponsoring a predictive data competition using 311 data from four different cities. My hope is that – if we can demonstrate that there are some predictive and socially valuable insights to be gained from this data – we might be able to persuade cities to try to work together to share data insights and help everyone become more efficient, address social inequities and address other city problems 311 data might enable us to explore.

Here’s the backstory and some details in anticipation of the formal launch:

The Story

Several months back Anthony Goldbloom, the founder and CEO of Kaggle – a predictive data competition firm – approached me asking if I could think of something interesting that could be done in the municipal space around open data. Anthony generously offered to waive all of Kaggle’s normal fees if I could come up with a compelling contest.

After playing around with some ideas I reached out to Ben Berkowitz, co-founder of SeeClickFix (one of the world’s largest implementers of the Open311 standard) and asked him if we could persuade some of the cities they work for to share their data for a competition.

Thanks to the hard work of Will Cukierski at Kaggle as well as the team at SeeClickFix we were ultimately able to generate a consistent data set with 300,000 lines of data involving 311 issues spanning 4 cities across the United States.

In addition, while we hoped many of who might choose to participate in a municipal open data challenge would do so out curiosity or desire to better understand how cities work, both myself and SeeClickFix agreed to collectively put up $5000 in prize money to help raise awareness about the competition and hopefully stoke some media (as well as broader participant) interest.

The Goal

The goal of the competition will be to predict the number of votes, comments and views an issue is likely to generate. To be clear, this is not a prediction that is going to radically alter how cities work, but it could be a genuinely useful to communications departments, helping them predict problems that are particularly thorny or worthy proactively communicating to residents about. In addition – and this remains unclear – my own hope is that it could help us understand discrepancies in how different socio-economic or other groups use online 311 and so enable city officials to more effectively respond to complaints from marginalized communities.

In addition there will be a smaller competition around visualization the data.

The Bigger Goal

There is, however, for me, a potentially bigger goal. To date, as far as I know, predictive algorithms of 311 data have only ever been attempted within a city, not across cities. At a minimum it has not been attempted in a way in which the results are public and become a public asset.

So while the specific problem this contest addresses is relatively humble, I’d see it as a creating a larger opportunity for academics, researchers, data scientists, and curious participants to figure out if can we develop predictive algorithms that work for multiple cities. Because if we can, then these algorithms could be a shared common asset. Each algorithm would become a tool for not just one housing non-profit, or city program but a tool for all sufficiently similar non-profits or city programs. This could be exceptionally promising – as well as potentially reveal new behavioral or incentive risks that would need to be thought about.

Of course, discovering that every city is unique and that work is not easily transferable, or that predictive models cluster by city size, or by weather, or by some other variable is also valuable, as this would help us understand what types of investments can be made in civic analytics and what the limits of a potential commons might be.

So be sure to keep an eye on the Kaggle page (I’ll link to it) as this contest will be launching soon.