Navigating New Horizons

ScraperWiki: Hacks and Hackers Day comes to Dublin

Hacks and Hackers Hack Day is taking place in Ireland on the 16th of November during Dublin Innovation Week. The organiser of the day-long event is ScraperWiki. Their aim is to provide the resources that allow anyone with any kind of programming ability to develop, store, and maintain software tools for the purposes of extracting and linking data.

By providing the means to create accessibility to data ScraperWiki can allow interested parties such as journalists to take advantage of initiatives such as the UK Government’s policy to make its data more available to the public. Since the UK Expenses Scandal, where certain British parliamentarians were found to have abused their statutory allowances, journalists have become increasingly aware of the wealth of potential stories that lie in databases around the world. However, this data has usually been stored in a random, unstructured and relatively inaccessible manner.

According to Aine Mcguire, in charge of sales and marketing for ScraperWiki, change has only come recently, “In 2003, a gentleman called Julian Todd contacted the UK Government to find out how various MPs had voted on the war. When he tried to get this information in order to do some analysis on it he was advised by the Cabinet Office that all this information was published in Hansards which is the official publishing body of the UK government. But it was difficult [to access.] It was deep down inside a website and he couldn’t do anything with it.

“So Julian went and scraped all that information from Hansards and…then fed it into a website in the UK called The Public Whip which shows you the voting record of all of the MPs in the UK.

“But it was very controversial as he risked imprisonment for doing this because of Crown copyright. But they didn’t imprison him and it was Julian Todd who came up with the idea for ScraperWiki.”

Active since March, 2010, Aine says Scraperwiki aims to, “build the largest community supported public data store in the world.

“You’ve got Wikipedia which supports content that’s predominantly for text and OpenStreetMap is for maps. What we want to do is create a wiki for data. We’re taking data that is in a very unstructured style and putting it into our structured data store. Where appropriate we’re adding longitude and latitude tags. We’re geo-tagging it which means that data can be mapped.”

In line with its aim of being a worldwide data resource project ScraperWiki has had datasets submitted from countries such as the UK, Brazil, Germany, Estonia, Greece and France to name a few. These datasets cover such subjects as the 11,000 deep sea oil wells around the UK, public transport traffic incidents in London, oil rig accidents and so on.

“As well as being a datastore it’s a wiki for code.” Aine explains, “At the moment if you want to do some programming you would go out into the web somewhere, you download some tools, you would install them in the server. Scraperwiki allows you to directly program on the browser so in effect we’re given you lots of libraries for you to program with.

“You can write a screenscraper that goes that uses any of the libraries we’ve got in our browser technology. You can use Python PHP, or Ruby. So you can go off and scrape without having to install anything on your PC or server.”

An added benefit is that because of the inherently collaborative nature of wikis the possibility exists for code to be updated and improved and shared by other programmers.

Aine describes what to expect from the Hacks meet Hackers Hack Day, “At the beginning of the day we have a little presentation about what a Hacks and Hackers Hack Day is all about. Then we give a little presentation on ScraperWiki although we don’t prescribe that they use it. Then we let the journalists and developers gravitate together to form teams over datasets of interests. Then they go off and hack all day. At six o’clock we ask the project groups to come back and present for three minutes each their particular visualisation of the data set that they have worked on.”

Prizes are then awarded and there is a reception for the participants to attend. At a previous event in Liverpool in July eight projects were produced by journalists and programmers working together using open data.

For data driven journalism to flourish information even with the maximum reasonable amount of access granted by governments around the world the problem still exists of data being stored in data silos. Information has to be accessible not only by other people then those who made the original entries but by other machines as well. Structuring information for greater accessibilty is not going to happen all by itself. It will take the sort of co-ordinated and collaborative effort that organisations such as ScraperWiki offer to really make our world a more open and transparent place to live and work in.

At the moment of writing the Hacks and Hackers day taking place in Dublin is fully subscribed but tickets are still available for the Belfast event on the 13th of November.

It is a free event and Scraperwiki is a not for profit organisation. Please contact Aine through their website if you would be interested in sponsoring a part of the event.