In it he made an important observation about those at the workshop who were pushing for linked data from the beginning, and wished there was a solution. First the observation:

There did seem to be a bit of resistance to the linked data approach, mainly because agreeing standards seems to be a long, drawn out process, which is counter to the JFDI approach of publishing local data… I also recognise that there are difficulties in both publishing the data and also working with it… As we learned from the local elections project, often local authorities don’t even have people who are competent in HTML, let alone RDF, SPARQL etc.

He’s not wrong there. As someone who’s been publishing linked data for some time, and who conceived and ran the Open Election Data project Stuart refers to, working with numerous councils to help them publish linked data I’m probably as aware of the issues as anyone (ironically and I think significantly none of the councils involved in the local government e-standards body, and now pushing so hard for the linked data, has actually published any linked data themselves).

That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is unrealistic at best, and could undermine the whole process.

So what’s to be done? I think the sensible thing, particularly in these straitened times, is to concentrate on getting the raw data out, and as much of it as possible, and come down hard on those councils who publish it badly (e.g. by locking it up in PDFs or giving it a closed licence), or who willfully ignore the guidance (it’s worrying how few councils publishing data at the moment don’t even include the transaction ID or date of the transaction, never mind supplier details).

Beyond that we should take the approach the web has always done, and which is the reason for its success: a decentralised, messy variety of implementations and solutions that allows a rich eco-system to develop, with government helping solve bottlenecks and structural problems rather than trying to impose highly centralised solutions that are already being solved elsewhere.

Yes, I’d love it if the councils were able to publish the data fully marked up, in a variety of forms (not just linked data, but also XML and JSON), but the ugly truth is that not a single council has so far even published their list of categories, never mind matched it up to a recognised standard (CIPFA BVACOP, COFOG or that used in their submissions to the CLG), still less done anything like linked data. So there’s a long way to go, and in the meantime we’re going to need some tools and cheap commodity services to bridge the gap.

[In a perfect world, maybe councils would develop some open-source tools to help them publish the data, perhaps using something like Adrian Short’s Armchair Auditor code as the basis (this is a project that took a single council, WIndsor & Maidenhead, and added a web interface to the figures). However, when many councils don’t even have competent HTML skills (having outsourced much of it), this is only going to happen at a handful of councils at best, unless considerable investment is made.]

Stuart had been thinking along similar lines, and made a suggestion, almost a wish in fact:

I think the way forward is a centralised approach, with authorities publishing CSVs in a standard format on their website and some kind of system picking up these CSVs (say, on a monthly basis) and converting this data to a linked data format (as well as publishing in vanilla XML, JSON and CSV format).

He then expanded on the idea, talking about a single URL for each transaction, standard identifiers, “a human-readable summary of the data, together with links to the actual data in RDF, XML, CSV and JSON”. I’m a bit iffy about that ‘centralised approach’ phrase (the web is all about decentralisation), but I do think there’s an opportunity to help both the community and councils by solving some of these problems.

And that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs.

Each transaction is tied to a supplier record for the council, and increasingly these are linked to company info (including their company number), or other councils (there’s a lot of money being transferred between councils), and users can add information about the supplier if we haven’t matched it up.

Every transaction, supplier and company has a permanent unique URL and is available as XML and JSON

We’ve sorted out some of the date issues (adding a date fuzziness field for those councils who don’t specify when in the month or quarter a transaction relates to).

Transactions are linked to the URL from which the file was downloaded (and usually the line number too, though obviously this is not possible if we’ve had to extract it from a PDF), meaning anyone else can recreate the dataset should they want to.

The whole spending dataset is available as a single, zipped CSV file to download for anyone else to use.

It’s all open data.

There are a couple of features Stuart mentions that we haven’t yet implemented, for good reason.

First, we’re not yet publishing it as linked data, for the simple reason that the vocabulary hasn’t yet been defined, nor even the standards on which it will be based. When this is done, we’ll add this as a representation.

And although we use standard identifiers such as SNAC ids for councils (and wards) on OpenlyLocal, the URL structure Stuart mentions is not yet practical, in part because SNAC ids doesn’t cover all authorities (doesn’t include the GLA, or other public bodies, for example), and only a tiny fraction of councils are publishing their internal transaction ids.

Also we haven’t yet implemented comments on the transactions for the simple reason that distributed comment systems such as Disqus are javascript-based and thus are problematic for those with accessibility issues, and site-specific ones don’t allow the conversation to be carried on elsewhere (we think we might have a solution to this, but it’s at an early stage, and we’d be interested to hear other idea).

But all in all, we reckon we’re pretty much there with Stuart’s wish list, and would hope that councils can get on with extracting the raw data, publishing it in an open, machine-readable format (such as CSV), and then move to linked data as their resources allow.

25 Responses

I’m a local councillor and cabinet member. Came across your post by accident when looking for info about how other councils are doing this. Not a techy at all; really don’t understand what the problem with pdfs is. That’s how my council are going to put our data up. You seem very opposed to this – can you let me know why? Your post seems to advocate CSV information on websites – so that’s just Excel, right? So what’s the difference in turning an Excel page into a PDF and putting it up?

For info – we’re a smallish council facing 40% cuts this year. We have one web author in the comms department, not sure they have any technical expertise, and run our site on a CMS. Anything we needed to do beyond the ordinary would be the subject of a budget bid – and, to be honest, new spend isn’t really getting approved right now. Have no idea at all how to judge what the right thing to do is.

Great post – thanks for helping answer a lot of the questions I have about linked data and how difficult it will be for councils, with small web teams, to implement. The JFDI approach is right and council’s are likely to procrastinate if you give them too many things to thing about!

Cllr AB – PDF’s are great to read and print, but can’t be read by machines. When developers make an application it needs to be able to automatically interrogate the data. PDF’s make this almost impossible. CSV is a format that works and excel can output, but it’s not “excel”(.xls). This is important because some people don’t use microsoft and/or the microsoft (.xls) format causes problems for developers writing applciations.

It does not take any specialist skill to upload a csv file. There is no harm in publishing in PDF as well, but dont make it the only format you use.

Thanks for the comment. The problem is that PDFs are designed to be viewed on a computer screen or printed out, and even then are very problematic for people with accessibility issues.

Getting the ‘data’ out of them is a slow, painful problematic manual process that often results in errors. So when you say you’re intending to ‘put our data up’, you’re actually saying we’ll put it in a format that makes the data very difficult to get.

A CSV file on the other hand, is designed as a format for storing data, which is why it’s so easily read by and imported into Excel and other spreadsheet programs, Google docs, database programs and websites like OpenlyLocal and visualisation sites such as Many Eyes.

As you say, it’s easy to get CSV out of Excel, so why not do that? I can’t see it would take any more time or expertise to do this than make a PDF out of the same data, nor should it take any more time to put it on your website.

“That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is unrealistic at best, and could undermine the whole process.”

Over 90 Local Government Councils use Agresso Business World (ABW) from Unit 4 Business Software and we have provided them with the ability to extract and store as open linked data. The data extract from RB Windsor & Maidenhead was created using Agresso Business World. In fact all users of Agresso Business World have the ability to extract and store as open linked data with no technical knowledge required. The extracted data is stored in a SPARQL compliant endpoint and available for querying. From a Unit 4 perspective at least the January deadline is easily achievable if you are using Agresso Business World…

{Link to Unit 4 press release}

I agree that there will be differences in how Councils store their data. Currently we build the ontology on the fly as the data is extracted. The key here is for Councils to have a common approach to storing supplier information for example ProClass, http://websites.uk-plc.net/Coding_International_ltd/Proclass-31509.htm. Either way it is relatively simple to mash data and even update at a later date. Part of the open nature of the exercise is the feed back loop which will coerce data exporters into providing better data. For better read common structure easily compared!

Whilst extracting as CSV will allow the Councils to tick the box it is not really in the spirit of the exercise. Open linked data is relatively simple and the tools are being provided.

Anwen
Not sure the best way of advancing the debate is pasting in the text of a press release or sales brochure. Also the Windsor and Maidenhead data is not (as far as I’m aware) is being published as linked data, only as CSV files which are not only missing much of the key data but are inconsistent, with both the content and field names changing from file to file. So not a great ad for your system, if this indeed is being used to produce these files.
C

Cllr AB – as others have pointed out – once you’ve got a publishable set of data you have to click a button to make a .pdf and you have to click another button to make a CSV. Not many councils are already publishing their expenditure data yet, but of those that do many are publishing a CSV and a PDF at the same time – so the machines can read the CSV and if humans want to – they can read the PDF.

Thanks B, countculture and Ingrid. So I just need to tell our people to put it up as a CSV file, I get that, though I don’t really (sorry) know why – surely the main use of it is for people to read it? Can’t really understand why it needs to be machine-readable…

Next question: is there a recommended standard for the headings to use etc? I see you’ve mentioned that Windsor council use different field headings and so on: they are the council that are always used to beat us with when a DCLG official writes a letter telling us to do this thing, so if even they are not getting it right, who is?

Cllr AB
Why data rather than something you can only read? Well, think about when your finance department send their reports to DCLG as Excel spreadsheets – why do they do it as a spreadsheet rather than a PDF (or even on paper). When you get residents to fill in a form on your website that integrates with you back office systems, why do that rather than just sending an email which needs to be retyped. It’s because when the information is available as data rather than just human readable text we can do a lot more with it, and remove extra human work which does nothing but add cost and time.
Re recommended headings, I expect these to be sorted in the next month or so, but in the meantime by all means get your dept to contact me. I’m actually on holiday at the moment so won’t respond immediately but happy to help them out when I get back.
C

Correct RBWM is not as yet in linked format but we have just made a utility available to enable our customers to publish in linked RDF format – free of charge. The purpose of linking the press release was to prove we are very serious about this – it good good coverage in Public Technology last week.

The value of linked data (and indeed of any open data) is limited by the lack of standards strictly defining it’s meaning. BUT I totally agree that if people wait for standards nothing will happen. Even when some sort of standard evolves by which people will describe their data it becomes too difficult and expensive to implement it. The whole beauty of Open Linked Data is that it does facilitate the “Just do it” (I’ll drop the F) approach to publishing data.

Publishing data as CSV is much better than pdf files as people have said. We have made it possible for our customers to convert CSV to RDF/XML as they can with xls output from Agresso browser screens. The RDF/XML format will have made the data more accessible but, of course, will not have addressed the standards / naming issue any more than it was addressed in the CSV file.

So … I accept your comments … we are loking to make a difference and to be part of the collaborative engagement and we are always prepared for constructive feedback.

I think the best advancment of debate is to debate using accurate facts and the fact is Unit 4 have provided a way for all users of Agresso Business World (ABW) to produce open linked data. How long is it before users take advantage of this functionality? That is a different debate…

You mentioned in your post that the January deadline was “unrealistic at best” and I disagree. The issue is the will too extract rather than the technology used to extract. How simple is it to generate a table using Excel!? However there are issues with extracting responsibly as not all suppliers are suppliers in the accepted sense of a commercial entity.

The data provided by RB Windsor & Maidenhead is extracted from ABW using standard Browser functionality. That does not mean that it is coherent. However the functionality is available to users to extract this information.

There is already Local Goverment owned procurement classification standard known as ProClass;

A. In simple terms, to provide a more meaningful breakdown on council expenditure using a sensible hierarchy of headings. This would also support the comparison of third party expenditure information on a like for like basis between UK councils, subregions and regions using one common standard. This now means that initiatives designed to reduce expenditure can be compared and monitored between council.”

If two councils adhered to this for their procurement would we not be a lot closer to our goal of comparing open linked data?

I wrote the open linked data extract routines for ABW and have consistently drawn a blank on what to extract because the community at large get hung up on ontologies and vocabularies. That isn’t a dig but an observation. Sometimes the first pass needs to be “dirty” because it is the catalyst for further development. I understand that they are important but the data is still meaningful without a ratified ontology which has taken ages to pass through a forum. Get the data out there and let the people using it debate the pros and cons! Then we can repeatedly refine what we extract. If it isn’t good enough they will shout until it is.

I have taken the liberty of downloading the spending.csv file from OpenlyLocal and put it though our convertor with default settings. The resultant RDF of 897822 statements has been uploaded to one of our development stores and is available for SPARQL querying at;

[…] A Local Spending Data wish… granted « countculture And that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs. (tags: opendata blog local data comment money) […]

[…] the charities in the UK and their registration numbers so that I could try to match them up to the local council spending data OpenlyLocal is aggregating and trying to make sense of. A fairly simple request, you’d think, especially in this new world of transparency and open […]

Strawberry Hill Residents' Association (SHRA) is a lively and busy organisation involved in many issues that affect the lives of local people. SHRA is non-political. The Association frequently meets local councillors, council officers and members of local groups to represent the views of residents.

Cumbernauld Media is Cumbernauld's local website, with the latest analysis, blog pieces, event listings, news articles, opinion stories, and picture galleries from the North Lanarkshire town of Cumbernauld.

Buckhurst Hill Parish Council is the first of three “tiers”- the other two being Epping Forest District Council and Essex County Council. The Parish Council is an autonomous body in its own right and each of the three tiers has its own duties and services it provides within its area of operation.

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.