Blog

I’m back from Open Data Camp 2; and I’m finding it difficult to make a coherent whole of it all.

Perhaps it’s the nature of the lack of structure of an un-conference. Maybe the different stakeholders in the open data community throughout the various hierarchies have a common aim but different levers to pull: the minister with the will to make changes; the digital civil servants with great expectations but not great budgets; the hacker who tries to consume the open data in their spare time, and creates new standards and systems for data in their day job.

There seemed to be a few themes which echoed through the conference:

Skills

There’s a recognition that improving people’s skills and recognising the skills people already have is critical, whether it’s people crafting Linked Data in Microsoft Word and wondering why it doesn’t work or getting local authorities to search internally for their invisible data ninjas.

Sometimes those difficulties occur due to differences in the assumed culture for different types of data — it seems everyone working in GIS would know what was meant by an .asc file and how to process it, but this information isn’t obvious to someone fresh to the data. Is there a need for improved documentation; linked to from Data sets? Or the ability to ask questions of other people interested in the same datasets about interpretation and processing in comments?

Feedback

How do you know if your data is useful to people? Blogs have a useful feature called pingback – the referencing blog sends a message to the linked blog to let them know they’ve been linked to. There was quite a bit of discussion as to whether this functionality would be useful: particularly for informing people if breaking changes to the data might occur.

Also, when data sits around not being used, people don’t notice problems with it. When things break noisily and publicly — like taking down a cafeteria’s menu system — it’s a bit embarrassing, but it does get the problem fixed quickly!

Core Reference Data

One of the highlights of the weekend was a talk on the Address Wars — the financial value of addresses and the fight to monetise them and their locations, the problems caused for the 2001 census as a result of not being able to afford a product from the Royal Mail and Ordnance Survey, both of which were wholly government owned at the time.

It highlighted how much core reference data — lists of names and IDs of things — is critical as the glue which allows different data to be joined and understood. Apparently there’s 20 different definitions of ‘Scotland’ and 13 different ways of encoding gender (almost all of which are male or female). There’s no definitive list of hospitals, and seven people claim to be in charge of the canonical list of business names and addresses. Hence there’s a big push from GDS at the moment to create single canonical registers.

But there’s other items that need standardised encodings. The DCLG have been working on standardised reasons for why bins don’t get emptied – one of the most common interactions people have with their council. There’s a lot more work to be done across the myriad things government does, and it’s not quite clear where it should be happening: councils are looking to leadership from central government, central government wants councils to work together on it, possibly with the Local Government Association. This only gets more complicated when dealing with devolved matters or finding appropriate international standards to use.

Meeting people

I’m also really happy to have met Chris Gutteridge who was showing off some of the things he’s been working on.

Equipment.data.ac.uk brings together equipment held by various UK universities in a federated discoverable fashion by making use of well-known URLs to point to well-formatted data on each individual website. So each organisation is in control of their data and is the authoritative source for it, and builds upon having a singular place to start discovering linked data about an organisation. It’s the first time I’ve actually seen linked data in the wild joining across the web like Tim Berners-Lee intended!