Archive for the ‘statistics’ Category

Like buses, you wait ages for local councils to publish their spending data, then a whole load come at once… and consequently OpenlyLocal has been importing the data pretty much non-stop for the past month or so.

We’ve now imported spending data for over 140 councils with more being added each day, and now have over a million and a half payments to suppliers, totalling over £10 billion. I think it’s worth repeating that figure: Ten Billion Pounds, as it’s a decent chunk of change, by anybody’s measure (although it’s still only a fraction of all spending by councils in the country).

Along with that we’ve also made loads of improvements to the analysis and data, some visible, other not so much (we’ve made loads of much-needed back-end improvements now that we’ve got so much more data), and to mark breaking the £10bn figure I thought it was worth starting a series of posts looking at the spending dataset.

Let’s start by having a look at those headline figures (we’ll be delving deeper into the data for some more heavyweight data-driven journalism over the next few weeks):

144 councils. That’s about 40% of the 354 councils in England (including the GLA). Some of the others we just haven’t yet imported (we’re adding them at about 2 a day); others have problems with the CSV files they are publishing (corrupted or invalid files, or where there’s some query about the data itself), and where there’s a contact email we’ve notified them of this.

The rest are refusing to publish the CSV files specified in the guidelines, deciding to make it difficult to automatically import by publishing an Excel file or, worse, a PDF (and here I’d like to single out Birmingham council, the biggest in the UK, which shamefully is publishing it’s spending only as a PDF, and even then with almost no detail at all. One wonders what they are hiding).

£10,184,169,404 in 1,512,691 transactions. That’s an average transaction value of £6,732 per payment. However this is not uniform across councils, varying from an average transaction value of £669 for Poole to £46,466 for Barnsley. (In future posts, I’ll perhaps have a look at using the R statistical language to do some histograms on the data, although I’d be more than happy if someone beat me to that).

194,128 suppliers. What does this mean? To be accurate, this is the total number of supplying relationships between the councils and the companies/people/things they are paying.

Sometimes a council may have (or appear to have) several supplier relationships with the same company (charity/council/police authority), using different names or supplier IDs. This is sometimes down to a mistake in keying in the data, or for internal reasons, but either way it means several supplier records are created. It’s also worth noting that redacted payments are often grouped together as a single ‘supplier’, as the council may not have given any identifier to show that a redacted payment of £50,000 to a company (and in general there’s little reason to redact such payments) is to a different recipient than a redacted payment of £800 to a foster parent, for example.

However, using some clever matching and with the help of the increasing number of users who are matching suppliers to companies/charities and other entities on OpenlyLocal (just click on ‘add info’ when you’re looking at a supplier you think you can match to a company or charity)., we’ve matched about 40% of these to real-world organisations such as companies and charities.

While that might not seem very high, a good proportion of the rest will be sole-traders, individuals, or organisations we’ve not yet got a complete list of (Parish and Town councils, for example). And what it does mean is we can start to get a first draft of who supplies local government. And this is what we’ve got:

66,165 companies, with total payments of £3,884,271,203 (£3.88 billion), 38.1% of the total £10bn, in 579,518 transactions, making an average payment of £6,702.

8,236 charities, with total payments of £415,878,177, 4.1% of the total, in 55,370 transactions, making an average payment of £7,511.

Next time, we’ll look at the company suppliers in a little more detail, and later on the charities too, but for the moment, as you can see we’re listing the top 20 matched indivudual companies and charities that supply local government. Bear in mind a company like Capita does business with councils through a variety of different companies, and there’s no public dataset of the relationships between the companies, but that’s another story.

Finally, the whole dataset is available to download as open data under the same share-alike attribution licence as the rest of OpenlyLocal, including the matches to companies/charities that are receiving the money (the link is at the bottom of the Council Spending Data Dashboard). Be warned, however, it’s a very big file (there’s a row for every transaction), and so is too big for Excel (or even Google Fusion tables for that matter), so it’s most use to those using a database, or doing academic research.

* Note: there are inevitably loads of caveats to this data, including that councils are (despite the guidance) publishing the data in different ways, including, occasionally, aggregating payments, and using over-aggressive redaction. It’s also, obviously, only 40% of the councils in England., although that’s a pretty big sample size. Finally there may be errors both in the data as published, and in the importing of it. Please do let us know at info@openlylocal.com if you see any errors, or figures that just look wrong.

However, useful though that is, that’s like taking a peak at a company’s bank statement and thinking it tells the whole story. Many of the payments relate to goods or services delivered some time in the past, some for things that have not yet been delivered, and there are all sorts of things (depreciation, movements between accounts, accruals for invoices not yet received) that won’t appear on there.

That’s what the council’s accounts are for — you know, those impenetrable things locked up in PDFs in some dusty corner of the council’s website, all sufficiently different from each other to make comparison difficult:

For some time, the holy grail for projects like OpenlyLocal and Where Does My Money Go has been to get the accounts in a standardized form to make comparison easy not just for accountants but for regular people too.

The thing is, such a thing does exist, and it’s sent by councils to central Government (the Department for Communities and Local Government to be precise) for them to use in their own figures. It’s a fairly hellishly complex spreadsheet called the Revenue Outturn form that must be filled in by the council (to get an idea have a look at the template here).

They’re not published anywhere by the DCLG, but they contain no state secrets or sensitive information; it’s just that the procedure being followed is the same one as they’ve always followed, and so they are not published, even after the statistics have been calculated from the data (the Statistics Act apparently prohibit publication until the stats have been published).

So I had an idea: wouldn’t it be great if we could pull the data that’s sitting in all these spreadsheets into a database and so allow comparison between councils’ accounts, thus freeing it from those forgotten corners of government computers.

This would seem to be a project that would be just about simple enough to be doable (though it’s trickier than it seems) and could allow ordinary people to understand their council’s spending in all sorts of ways (particularly if we add some of those sexy Where Does My Money Go visualisations). It could also be useful in ways that we can barely imagine – some of the participatory budget experiments going in on in Redbridge and other councils would be even more useful if the context of similar councils spending was added to the mix.

So how would this be funded. Well, the usual route would be for DCLG or perhaps the one of the Local Government Association bodies such as IDeA to scope out a proposal, involving many hours of meetings, reams of paper, and running up thousands of pounds in costs, even before it’s started.

They’d then put the process out to tender, involving many more thousands in admin, and designed to attract those companies who specialise in tendering for public sector work. Each of those would want to ensure they make a profit, and so would work out how they’re going to do it before quoting, running up their own costs, and inflating the final price.

So here’s part two of my plan, instead going down that route, I’d come up with a proposal that would:

be a fraction of that cost

be specified on a single sheet of paper

paid for only if I delivered

Obviously there’s a clear potential conflict of interest here – I sit on the government’s Local Public Data Panel and am pushing strongly for open data, and also stand to benefit (depending on how good I am at getting the information out of those hundreds of spreadsheets, each with multiple worksheets, and matching the classification systems). The solution to that – I think – is to do the whole thing transparently, hence this blog post.

In a sense, what I’m proposing is that I scope out the project, solving those difficult problems of how to do it, with the bonus of instead of delivering a report, I deliver the project.

Is it a good thing to have all this data imported into a database, and shown not just on a website in a way non-accountants can understand, but also available to be combined with other data in mashups and visualisations? Definitely.

Is it a good deal for the taxpayer, and is this open procurement a useful way of doing things? Well you can read the proposal for yourself here, and I’d be really interested in comments both on the proposal and the novel procurement model.

Given all this, I’m guessing the Audit Commission is a tough places to be right now. Local Authorities have long complained about the burden it puts on them, the Conservatives have made it plain they see it as a problem rather than a solution so far as efficiency goes, and even the government is scaling back its desire to have targets for everything.

So, given this, perhaps this paper would see a realisation by the commission that if it doesn’t change its perspective it will become at best irrelevant and at worst a roadblock to open data, increased transparency, efficiency and genuine change.

The data is very difficult to combine with other data – the Commission don’t use the same identifiers for local authorities as other public bodies and there’s no clear way to combine their IDs with other ones (e.g. ONS SNAC ids).

In short, it’s a typical government body — all focused on process rather than delivery. And its response to the changing landscape of open data, the move from a web of documents to a web of data, and the potential to engage with data directly rather than through the medium of dry official reports?

Actually it’s what you’d expect: there’s a fair bit of social-media blah-blah-blah — Facebook, US open data initiatives, MySociety/FixMyStreet, etc; there’s a bit about transparency that doesn’t actually say much; and then there’s a lot of justification for why there needs to be an Audit-Commission type body which manages to both include jargon (RQP) and avoid talking about the real problems preventing this.

What are these?

Structural problems — although the net financial benefit to government as a whole will be significant, this will be achieving by stripping out existing wasteful processes, duplication, and intermediary organizations. The idea that a local authority should supply the same dataset to three different bodies in three different formats and three different ways is ludicrous. Particularly when those bodies then spend even more time reworking the data to allow a matchup to other datasets.

This is just an unnecessary gunk that’s gumming up the work, and the truth is the Audit Commission is one of those problem bodies.

Technical/contractual problems — it’s not always easy for legacy systems to expose data, and even where it is, the nature of public-sector IT procurement means that it’s going to cost. Ultimately we need to change how government does IT, but in the meantime we need to make sure the money comes from the vast savings to be made be removing the gunk. This means overcoming silos, which is no easy task.

Identifier problems — being able to uniquely identify bodies, areas, categories, etc. Anyone who’s ever done any playing around with government data knows this is one of the central frustrations, and blockers when combing data. Is this local authority/ward/police authority/company the same as that one. What do we mean by ‘primary school’ spending and can we match it against this figure from central government. Some of these questions are hard to answer, but made much harder when organisations don’t use common, public identifiers.

Astonishingly the Audit Commission paper doesn’t really cover these issues (and doesn’t even mention the issue of identifiers, perhaps because it’s so bad at them). Is this because they haven’t really understood the issues, or is it because the paper is more about trying to make it seem relevant in a changing world? Either way, it’s got problems, and given the current attitude it doesn’t seem in a position to address them itself.

However, what I was really interested in was getting and showing statistics about local areas that’s a bit more, well, meaty. So when I did that statistical backend of OpenlyLocal I wanted to make sure that I could use it for other datasets from other sources.

The first of those is now online, and it’s a good one, the 2006-07 Local Spending Report for England, published in April 2009. What is this? In a nutshell it lists the spending by category for every council in England at the time of the report (there have been a couple of new ones since then).

However, unless you enjoy playing with spreadsheets (and at the very minimum know how to unhide hidden sheets and read complex formulae), it’s not much use to you. Much more helpful, I think, is an accessible table you can drill down for more details.

Here you can see the total spending for each council over all categories (and also a list of the categories). Click on the magnifying glass at the right of each row and you’ll see a breakdown of spending by main category:

Local Spending breakdown for given council

Click again on the magnifying glass for any row now and you’ll see the breakdown of spending for the category of spending in that row:

Finally (for this part) if you click on the magnifying glass again you’ll get a comparison with councils of the same type (District, County, Unitary, etc) you can compare with other councils:

You can also compare between all councils. From the main page for the Local Spending Dataset, click on one of the categories and it will show you the totals for all councils. Click on one of the topics on that page and it will give you all councils for that topic. Well, hopefully you get the idea. Basically, have a play and give us some feedback.

[There’ll also be a summary of the figures appearing on the front page for each council sometime in the next few hours.]

There’s no fancy javascript or visualizations yet (although we are talking with the guys at OKFN, who do the excellent WhereDoesMyMoneyGo, about collaborating), but that may come. For the moment, we’ve kept it simple, understandable, and accessible.

Usually, when I’ve mentioned it on twitter it has usually been in the context of moaning about the less-than-friendly SOAP interface to the data (even by SOAP standards it’s unwieldy). There’s also the not insignificant issue of getting to grips with the huge amount of data, and how it’s stored on the ONS’s servers (at one stage I looked at downloading the raw data, but we’re talking about tens of thousands of files).

Still, like a person with a loose tooth, I’ve worried the problem on and off in quiet times with occasionally painful results (although the people at the ONS have been very helpful), and have now got to a level where (I think) it’s pretty useful.

Specifically, you can now see general demographic info for pretty much all the councils in England & Wales (unfortunately the ONS database doesn’t include Scotland or Northern Ireland, so if there’s anyone who can help me with those areas, I’d be pleased to hear from them).

More significantly, however, we’ve added a whole load of ward-level statistics:

Inevitably, much of the data comes from the 2001 Census (the next is due in 2011), and so it’s not bang up to date. However, it’s still useful and informative, particularly as you can compare the figures with the other wards in the council, or compare councils of similar type. Want to know which ward has the greatest proportion of people over the age of 90 years old. No prob, just click on the description (‘People aged 90 and over in this case) and you have it:

Doing the same on councils will bring up a comparison with similar councils (e.g. District councils are compared with other district councils, London Authorities with other London Authorities):

As you can see from the list of ONS datasets, there’s huge amounts of data to be shown, and we’ve only imported a small section, in part while we’re working out the best way of making it manageable. As you can see from the religion graph, where it makes more sense for it to be graphed we’ve done it that way, and you can expect to see more of that in the futrue.

It’s also worth mentioning that there are some gaps in the ONS’s database — principally where ward boundaries have changed, or where new local authorities have been formed, and if there’s only a small amount of info for a ward or council, that’s why.

In the meantime, have a play, and if there’s a dataset you want us to expose sooner rather than later, let me know in the comments or via twitter (or email, of course).

C

p.s. In case you’re wondering the graphs and data are fully accessible so should be fine for screenreaders. The comparison tables are just plain ordinary HTML tables with a bit of CSS styling to make them look like graphs, and the pie charts have the underlying data accompanying them as tables on the page (and can be seen by anyone else just by clicking on the chart).

[Note: Voting attendance is an imperfect proxy for actual attendance, as the figure may be depressed by silent abstentions (i.e. not voting in a division, rather than voting both ‘aye’ and ‘no’) and by just turning up to vote, but failing to attend the debate. However, until Parliament provides a better measure for attendance, or more transparency of MPs actions, this is the only one we have.]

It’s recess time again, and time for MPs’ end-of-term report. I’ll leave it to others to comment on how they’ve dealt with some of the genuinely momentous events since the summer recess. This post deals solely with their voting attendance record.

First off, let’s have an overall look at the overall figures for the period:

Oct-Dec 08

May 97-Jul 08

All MPs

70.2%

64.5%

Labour

74.8%

69.8%

Conservative

67.8%

61.7%

LibDem

72.2%

64.7%

The figures above are pretty self-explanatory. All parties have improved their attendance of votes, by 5 to 8 percentage points. Perhaps not surprising given the financial crisis.

Now let’s have a look at the main parties in detail, using the same histograms used before to show the distribution of the parties attendance figures. Interestingly (well, in a wonkish sort of way), the distributions are a bit more spread out than the long-term average. In part this is probably down to the shorter time period showing up variations that are hidden in longer period, but it’s interesting nevertheless to note that though all parties have improved their overall attendance figures, the number and proportion of Labour MPs who’ve voted in fewer than half the divisions has nearly tripled, from 11 MPs to 30 of them.

[Note: there’s no significance to the width of the columns — the recent ones are narrower so that both can be seen on the same graph]

Finally, let’s have a look at those outliers, first, the MPs who attended divisions less than than 50% of the time:

However, the rest of the list is more interesting. Some of those on the front bench, for example (e.g. Jacqui Smith, Jack Straw), surprisingly don’t make the list, i.e. they voted in at least 50% of the divisions. Ditto some of the opposition spokespeople.

But what about the backbenchers who are on the list. Possibly there’s a good reason for Margaret Hodge and Jessica Morden for failing to attend a single division — illness perhaps (though there’s nothing on either of their websites to indicate such a factor)? And what about Kali Mountford (14.3%) and Khalid Mahmood (21.4%).

If I was in their constituency, I’d like to know, particularly since they took little part in debates, either. Similarly for the low-raters for the Conservatives — Michael Mates and Tim Yeo (6 directorships!) at 35.7% each.

The above calculations were derived from the voting record freely available from the Public Whip project, and cover the period from Oct 2008 to Dec 2008. The data can be downloaded in the form of a MySQL database, and this was used together with custom MySQL queries to generate the figures.

[Note: Voting attendance is an imperfect proxy for actual attendance, as the figure may be depressed by silent abstentions (i.e. not voting in a division, rather than voting both ‘aye’ and ‘no’) and by just turning up to vote, but failing to attend the debate. However, until Parliament provides a better measure for attendance, or more transparency of MPs actions, this is the only one we have.]

A frequent arguments for low attendance of voting divisions by MPs is that the figure is depressed by ministers (and shadow spokespersons), whose other responsibilities prevent them from attending as many votes (as they’d like to), thus bringing down the overall average.

Seems reasonable, so let’s have a look at just how much of an influence this ‘ministerial effect’ has on the overall figures. First, let’s look at the average voting attendance for ministers and non-ministers (calculation details below):

Attendance rates May 97 – July 08

All MPs

65.1%

Non-Ministers

64.4%

Ministers

67.2%

Er, wait a minute, so the average voting attendance rate for ministers is higher than non-ministers? That’s not what we expected. However, basic averages (i.e. the mean) can hide a multitude of sins, so let’s have a look at the distribution of those attendance figures.

As you can see, while the peak of the ministerial attendance is around the 65% mark (less than that for the non-ministerial one), there were far more divisions in which 90%+ of ministers voted than there were for which 90%+ of non-ministers voted.

This makes sense, in a way, as ministers are far more likely than backbenchers to turn up en masse for votes their party sees as important. It’s this that largely accounts for the figures we saw in the table above. However, what the graph also shows is that when you take the ministers out of the equation, attendance definitely does not shoot up. There is, in short, no ‘ministerial effect’ to account for the low attendance of MPs.

[It’s worth mentioning that the ministerial office records are slightly incomplete — the record of Parliamentary Private Secretaries is missing during some periods — so I’ve run the figures for ministers both including and excluding PPSs. As you can see, it doesn’t make a lot of difference.]

The party lines

Having looked at the big picture, it’s time to look at the ministerial vs non-ministerial attendance by party, specifically the three main parties in Parliament.

As you can see, the relationship between ministerial and non-ministerial attendance is noticeably different for each of the parties. Labour ministers do indeed have noticeably lower attendance rates than their backbenchers, though not as much as I’d expected and not enough to alter the distribution massively.

However, for the Tories and LibDems, the surprising thing — for me, at least — was the attendance rates for their spokespersons are actually noticeably better than their backbenchers, raising rather than lowering the overall figures. What, I wonder, is the reason for this?

Finally, a couple of quick graphs to wrap this post up. One shows, perhaps not surprisingly, that Labour ministerial attendance rates are less than for the shadow spokespersons — presumably the time commitment for a governmental position is greater than that for the equivalent shadow position.

The other shows the distribution of backbenchers attendance figures, by party. I’ll leave that one without making any further comment.

The Ministerial/non-ministerial attendance rates were calculated by looking at every Commons division between May 1997 and July 2008, and working out the number of ministers/non-ministers who could have voted in that division, and the number who actually did vote. The average attendance figures in the table were calculated by dividing the aggregate number of votes by the aggregate number of possible votes.
To calculate the distribution of attendance rates I calculated the ministerial/non-ministerial attendance rate for each division, and plotted these on a graph to show how those attendance rates are distributed (as usual, I’ve made the underlying figures are available as a spreadsheet here and here if you want to examine them further).

The above calculations were derived from the voting record freely available from the Public Whip project, and cover the period from May 1997 to July 22, 2008 (when the house rose for the summer recess). The data can be downloaded in the form of a MySQL database, and this was used together with custom MySQL queries to generate the figures.

The graphs are visual representations of the density of the distribution, and were plotted using R using the kernel densityplot function.

When we rebuilt the openly local map we knew that there were hundreds of hyperlocal sites, Facebook pages and Twitter accounts out there that weren’t on the map. And that the map database contained hundreds of sites that had ceased publishing as natural wastage since 2009 in a fast moving sector where people are testing … Continue reading Archiving sites […]

The old OpenlyLocal map only had about 5 sites for Devon, mainly the defunct ‘Local People’ franchise. This couldn’t be right for England’s second largest county. Devon has a strong tradition of local pride in place and its rural nature meant there are obvious advantages to having websites for local civic life. As well as a … Continue reading Devon and Somer […]

Living and working in rural Aberdeenshire, where I run my own hyperlocal site, I am the official Scottish correspondent for Talk About Local. Following on from Will’s post about filling in the white space around Wakefield on the Local Web List map I thought I’d share my experiences adding in sites from Scotland. Way back … Continue reading Finding Scottish h […]

We’ve updated the search on the site so it includes meta data. WordPress doesn’t search this data, tags & categories, out of the box so we’ve added it in. So if you want to search for sites built on a certain platform type the platform in to the search box and hit enter, the same … Continue reading Updated Search

Carnegie UK Trust is supporting Talk About Local in improving the map of UK hyperlocal sites. Despite having hundreds of entries, the map when we rehosted it had some curious white spaces – it can’t be right that Devon only had five sites listed nor that towns such as Wakefield had none, nor Scotland about … Continue reading How to approach white spaces in […]

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.