Valiantly Blogging on a Number of Matters of the Utmost Importance, for the Benefit of All!

Wednesday, March 06, 2013

The inevitability of bad predictions

Yesterday the Guardian
published a piece of psephology by John Ross. He gets one thing right
but almost everything else wrong.

I’ll start positive. The
thing that Ross gets right is his main point: Conservative support has been in
decline for a long time. He says that since 1931, the Conservative share of the
vote has dropped by an average of 0.2% a year.

I agree. I’ve only looked
back to 1945 (covering 18 general elections rather than Ross’s 20), but I also
get an average 0.2% decline a year – with, of course, a lot of variation around
this general trend.

But what Ross doesn’t mention
is that the Labour vote has also declined, by an average of 0.2% a year (since
1945). Conversely, the Liberal/Lib Dem vote has risen over this period by an
average 0.3% a year. You can see the rough picture from this chart:

These numbers do, though,
depend on your starting-point. As Hopi Sen points out, 1931 was a
stunning Conservative landslide; 1945 was a Labour one. If you start at 1974,
after the first Liberal surge, the Labour trend is, on average, flat and the
Lib rise is under 0.1% a year. If you start at 1983, the Libs are in slight
decline. If you start at 1997, the Conservatives are on the up.

But for the sake of argument,
let’s stick with the longer-term picture.

Things really go wrong when
Ross looks to the future. This paragraph contains one of the highest
concentrations of wrongness I’ve ever seen:

Taking these projections, if the Tories won the
next election, they would get 34.6% of the vote, and if they lost they would
get 30.3% of the vote. As there is no doubt at present that the Tories will
lose, they will get 30.3% of the vote. As always there is a bit of statistical
noise in any calculation, so 29.3% to 31.3% would be a reasonable range, but
30.3% is the central figure.

What he seems to be doing is
separating elections that the Conservatives have won from ones they have lost,
and then extrapolating the trends for both categories.

This is a logical flub. You
don’t first ask whether the Conservatives will win and then go on to wonder
what vote they’ll get. You first ask what vote they (and other parties) will get,
and then use that to see who’ll win. Votes determine victories, not the other
way round.

So, having established that
there are only two possible Conservative vote shares in 2015, he then says “there is no doubt at
present that the Tories will lose”. Might
there be doubt in the future? I guess it’s doubtful. But the interesting thing
here is that Ross already has the
election result predicted without even using his system. Presumably he’s
looking at opinion polls like the rest of us. So what’s the use of the system?

Then there’s the “bit of
statistical noise”: he reckons his system’s predictions have a margin of error
of plus or minus 1%. That’s a lot better than the 3% that a normal-sized
opinion poll has. I wonder how he arrives at this number?

He doesn’t say, but if I
wanted to arrive at such a number, I’d start with this chart:

The dots are actual vote
shares and the lines are the overall average trend. The distances between the
dots and the lines show how close the model has been in the past. You’ll notice
that most of the dots are more than 1% away from the relevant lines.

In fact, the median error is 3.8%
for the Conservative vote, 2.1% for the Labour vote and 3.6% for the
Liberal/Lib Dem vote: half the time, the model was wrong by more than these
amounts.

But bear in mind that this
model is based on a very small sample of data: 18 election results since 1945
(or 20 for Ross since 1931; adding the extra two really won’t change the
picture). So the confidence intervals of any conclusions we draw from it may
well be large. And they are. The standard deviation of the error in the
predicted Conservative vote is 4.1%, in the Labour vote 4.6% and in the Lib Dem
vote 4.3%.

Assuming a normal
distribution, two-thirds of observed results would be expected to fall within
one standard deviation of the central result. For a typical 95% confidence
interval, we need to go plus or minus two standard deviations: 8.4% for the Conservative
vote, 9.2% for Labour, 8.6% for the Lib Dems.

So, my version of Ross’s
model gives these central projections for 2015: Conservatives 34.3%, Labour 31.5%,
Lib Dems 26.4%. But all I’d be confident in saying is that the Conservatives will
get between 25.9% and 42.7%, Labour between 22.3% and 40.7%, and the Lib Dems
between 17.8% and 35%.

Probably. Assuming that there’s
a genuine phenomenon here that will continue in the future. And ignoring all
polling evidence.

In conclusion: Yes, long-term
trends are noteworthy. But let’s not read too much into them.