Discoverability

Session leader Phil Weir from Flax&Teal said he had decided to pitch the session because his company had been working on a project to report on data sets and issues with them. In the course of doing that, he said, potential users of data said their problem was finding it.

Flax&Teal is kicking off a project to see if releasing the reports about data will give people an indication of what is in the data. But Phil suggested that this might only help with one of what he called two levels of discoverability: discoverability for the technical community; and discoverability for what might be called ordinary people. People who do not know about open data, or even that data might help with issues they have.

A participant from the ONS recognised the problem. “One of the biggest issues, when we do research, is people knowing data is there. When you publish lots of things on a website, sometimes with ludicrous names like ‘spreadsheet sting of numbers’ or ‘something something 25 words long’, its hard to find them. We did some research and found that just reducing down the size of the titles helped.”

Names matter

Other participants suggested an index would help. The ONS participant said it effectively had one, in the form of a calendar of releases: but since many of the release titles were not helpful it only got people so far. “It is a real problem. We have particular issues with the general public. There are real technical people who can dig in and find stuff. But Joe Public on the street finds it really difficult: if they want the GDP number or something. We want to get it out there but it is difficult.”

Tracey Gyateng, who pitched a session on how charities could engage with open data and open data could engage with charities on day one suggested that engaging with charitable bodies could help. Phil said his company had been working with community journalists to help them use data in their reporting.

But many campaigners or members of the public won’t be engaged with the charity sector or the media. “There is a really strong social case for releasing data,” he said. “But unless they are aware and have data skills, people won’t be able to use it.”

Words and pitchers

Nick Ananin, a project officer at Aberdeen City Council, led day one’s session on the purpose of open data. He argued that similar themes were coming up. “I was arguing (although I was shot down in flames) that you can’t publish open data unless you know what your purpose is, because that tells you what your products are, and that includes the meta-data that goes with your data,” he said. “It’s the meta-data that sign-posts what is in the data.

“Also, you can create a value feedback loop. Somehow, users need to get together to say what the value was: it achieved a better value for my business or it helped my charity in this way. Because then publishers can prioritise publication based on the value. We need to tackle this from the other end, through a wiki or whatever. We need to capture the outcomes and the value so we can define what we need to publish and how to do it.”

Following this thought, a participant from the ODI said: “I always try to make sure that when I use a data set, I acknowledge the original source of the data, and provide a link back to the original source, so that they have some information about what I have done, and other people using the tool can go to the original data set.”

Nick suggested that this kind of good practice should be encouraged. “We have five stars for open data publication, perhaps we should have five stars for open data users,” he said. “You get five stars, but others don’t get as many.”

However, there was some debate about whether ratings could have unintended effects. For example, they might encourage publishers to focus on a few, well used data sets, at the expense of more obscure and lesser used data sets, that might turn out to be critical in the future.

Find a data sherpa

Returning to the issue of discoverability, Nick suggested that the public needed something like the Dewey Decimal System in libraries: even somebody didn’t find exactly what they were looking for, they might find something very like it on nearby shelves.

Another participant suggested that what was needed was a ‘data sherpa’. “Meta-data only helps statisticians, who know what data sets and meta data are,” someone pointed out. “The public needs a ‘data sherpa’ – something you can “fire up and say take me to something about this.”

As a liveblogger, I suggested that what people really needed was more words: if people want to know, for example, what GDP is, then they are likely to Google it, and find the answer in a news story or on a press release. Publishers might look at ways of embedding data sets into stories and web pages that are easily discoverable; or create them by writing press releases or information pages to support the publication.

Phil suggested this indicated there is a chain that links the publisher of data and its users and beneficiaries, in which there are intermediate (or infomediate) links. Some of these might already be in place – public bodies and councils tend to have press officers, for example, who could craft content for data releases – while others might need to be created.