On January 17, Sunlight Foundation together with Public Sector Credit Solutions organized a PDF Liberation hackathon to help free trapped valuable information from PDF (Portable Document Format) documents. Last week, our very own Kathy Kiely wrote in a summary of the DC-based event about why PDF Liberation matters and how you could help. Some definitive suggestions included checking out events happening in your areas during Open Data Day which is coming up on February 22-23.

Over 20 attendees gathered at Rally.org where Marc Joffe of the Public Sector Credit Solutions, kicked off the hack with a brief overview of the event with Mike Rosengarten adding remarks from opengov.com — on why technology that extracts data from PDFs is valuable.

Mike stressed:

“It’s not painfully clear how valuable cutting-edge engineering efforts can be to our government. What may seem like a small challenge at a big tech company could be a huge challenge for a government (i.e. setting up a full-text search database and ingesting a million documents).”

He also added that one of the things he loved about hackathons is that, regardless of whether you’re writing code or not, you’re learning

After a productive two-day hack session, two winners emerged from the San Francisco PDF Liberation hackathon. Barb Singleton worked on a project that parsed (a process that involves analyzing a string of symbols in computer language) House financial disclosures while Karl Nicholas took on the daunting task of digitizing text-filled PDFs of Amnesty International’s torture data and was later able to create an Amnesty International Alerts Database. Karl’s challenge won the cash prize for utilizing the most technology.

A general sentiment felt by participants was that more civic innovation is needed — not just from outside, but from within government. Instead of solving government problems at hackathons, some thought should go into eliminating these problems in the first place.

In New York:

A Saturday morning snowstorm couldn’t keep some 30 hackers away — who showed up at the NYU-Poly Varick Street Incubator ready to hack away PDF problems. Joel Natividad from Ontodia used NYCpedia as an example of a platform that scraps data from NYPD CompStat PDFs and NYPD Crash & Collision PDFs. NYCpedia recently addressed the issue of traffic fatalities in New York after the city’s Mayor De Blasio, announced his VisionZero initiative earlier this month. The team at NYCPedia immediately went to work and built a VisionZero map based on data that was previously liberated from PDFs by Crashmapper, a literal life-saving tech innovation.

At the event, Mike Tigas and Jeremy Merrill also spoke about Tabula— an open source software used for PDF table extraction — one of the many options hackers have for available technology to use for liberating PDFs.

Some of the PDF Liberation hackers at the New York Location. Image credit: Joel Natividad from Ontodia.

Hackathons, unlike traditional events (which come with an agenda and an end note) are often places where ideas are hatched. With these ideas comes the question of sustainability, evaluation and lessons learned.

On sustainability, the organizers of NY PDF Liberation hackathon are already making efforts to work with the winners. For instance, NYCpedia is now working with Anna, to sync budget and crime data as well as working with Michael and Keisha on hosting with their data. The group also presented these collaborations on civic technology at the inaugural Manhattan Borough Board meeting earlier this week. Additionally, NYCpedia is working with NYCEDC to help further engage the community on the newly liberated data from their economic report.

Another way to ensure sustainability from hackathons is to make sure projects continue long after the end of the hack. For one, it is important for hackers to connect with software providers and possible sponsors who may be interested in funding their projects to completion. The NY hack helped address this issue by putting the hackathon winners in touch with firms such as Ontodia and Quandl and advocacy organizations such as Amnesty International and Center for Responsive Politics.

Some lessons learned: The New York organizers will be the first to tell you not to plan an event around unfavorable weather conditions because that will affect turnout. In their case, they were glad they ended up with quality over quantity. Also, bear in mind that at hackathons, some hackers may be more familiar with some data aggregation platforms than others. For instance a .NET hacker at the NY event was puzzled by the GitHub platform which delayed his entry submission.

A future issue to explore is the usefulness (or otherwise) of hackathons. Some argue, in the style of NYCBigApps, that having a kick-off event, then some intervening time period (several weeks/months) with perhaps some intervening meetups then an awards ceremony at the end may result in more sustainable projects. Also, be mindful that there may not be enough time to start and finish a project from scratch at hackathons. But it’s a great first step towards bringing together relevant people — from policy advocates to technologists to work across the aisle and brainstorm ways in which they can make their governments better.

And that is a win in itself.

We want to send a special thank you to all our the sponsors of the PDF hackathon including Sunlight Foundation, Public Sector Credit Solutions, Knight Mozilla Open News, Rally.org, OpenGov.com, Smart Chicago, Pediacities, Artifex, Quandl, Civic Ninjas, and the tech companies that provided special licences for the hackathon which included, ABBYY, PDFlib and ASPOSE.