A big step forward – but a new dataset over at ScraperWiki reveals there’s still a very long way to go. Developer Anna Powell-Smith has built a scraper for the Information Asset Register (IAR).

The IAR is a register of unpublished datasets held by government departments – and it has more than 2,100 entries. The database shows which department holds the information, and should include a short description of what’s in there.

The data shows how far is still to go for open information: for one, David Cameron’s release last week covers fewer than ten datasets – important ones, beyond a doubt, but only a scratch in the surface.

But this is just a small part of the problem, as anyone looking at the full data in Powell-Smith’s scrape can see: even in this register of government data, quality is low.

More than half of the records in the IAR are missing details – often details as basic as a description of the record’s contents. Some departments have submitted hundreds of datasets, while others appear to have merely carried out a cursory search and listed a handful. Some didn’t even bother to do that.

A first step for the government’s new Transparency Board should doubtless be to update the register and bring it up to scratch.

Cameron warned that the data would initially be patchy. Given the poor state of even this simple document, it seems he wasn’t kidding. The culture of government might be changing, but developers and journalists alike will need to keep on the pressure, if data good enough to be of use to anyone is going to come out.