Observing the Scottish elections, May 2007

The Open Rights Group
released, on 20 June 2007, its observers' report on e-Voting and
e-Counting.

I was one of the ORG's election observers, watching the election
itself, and particularly the electronic count, in Glasgow. I've
included my own report for ORG here as well. The comments below, as
well as those in the report, are mine, and not endorsed (of course) by
ORG.

My own impressions of the elections were somewhat less
pessimistic than the overall ORG ones. I think that's mainly because
I saw only e-counting (e-voting, I think, is fairly obviously nuts),
and because the Electoral Commission in Scotland, and the organisers
of the count in Glasgow, seemed pretty well-organised, and actively
concerned to make the whole process as transparent as possible.
Below, I've included some fairly informal supplementary remarks to my
report, some of which disagree mildly with the ORG conclusions.

I came into this very sceptical about the security of electronic
counts; I'm a lot less sceptical than I was, though I still think
there are very significant problems, albeit with apparently simple
solutions.

My impressions

It worked!

...very much to my surprise. I don't know whether Glasgow was
lucky or particularly well-organised, but the Glasgow count seemed to
go very smoothly, with results appearing within spitting distance of
the LA's four-hour estimate. I know there were software and hardware
problems in other (Scottish) centres, so the system contains at least
scope for disasters, but can also manifestly be made to work.

The English counts seem to have been much more problematic; see the
ORG report for the gruelling details.

Enabling STV

One of the better arguments in favour of e-counting is that it makes
intricate STV counts feasible, and since I'm rather an enthusiast for
STV I tend to feel that an electronic count is therefore acceptable.
The fact that the Northern Irish local authority (LA) elections are
counted by hand complicates this argument, but since the
Northern Irish STV system is substantially simpler than the Scottish
one, this remains at least an equivocal indicator that e-counting is
necessary.

Statistics

The process of handling hundreds of thousands of loose sheets of
paper, getting them into the one room, and not losing or mis-counting
too many of them on the way, is a lot harder than you might expect.
In retrospect this is pretty obvious, but it struck me pretty forcibly
when I realised that much of the drill in the polling stations and at
the count hall was pretty similar in outline to a traditional
election, even though most of the details were substantially
different.

For example, at one point in the protocol, clerks compare the
number of papers in a ballot box as counted at the polling station,
with the number counted by the scanner, and if the difference is in
the range +1 to -3, then that's passed as close enough. That seems
odd, but it turns out that this range is used because that's the
margin of error that's been found, in the past, with traditional
elections, to be consistently achievable.

Now, it might be the case that this error budget should be
reexamined now that scanners are in the mix, but even without that,
this range is useful for two reasons. Firstly, it reminds us that
traditional elections did in fact have an error budget,
estimated by experienced personnel, and secondly that this budget
(which, at approximately ±2 in a box containing around 400 papers,
comes out at ±0.5%) sets a rough scale for the various processes
involved in the electronic count. The overall error in the result
would be a combination of a few error sources, but if they're all
around this scale, then the overall error wouldn't be massively
greater than this: crudely, a process with
errors no bigger than 0.5% is, in this respect at least, no worse than
a traditional election.

It's probably also worth emphasising that in a context like this,
small systematic errors (that is, bias) are more important
than larger random errors. The smallest winning margin in the
parliament elections was Cunningham North at 0.2%, but the next lowest
was 1.6%, so the process as a whole could probably withstand a
systematic error of order 1% without serious difficulty. Since there
are multiple contests, the process could arguably withstand a random error
even larger than this without changing the overall
Scotland-wide result (in fact, the SNP won control of the Parliament
by a single seat).

I mention this, not to suggest that we shouldn't care about errors
in the count -- of course we should, and should be aiming for a figure
well below this nominal 1% scale -- but to suggest that the whole
process is reassuringly far from meltdown. The ballot-handling
process is such that it would be very hard, I believe, for any
individual to create a bias anywhere near as high as 1%. What
could create such bias, however, are the black
boxes comprising the scanners and counting software, and
that's where the real problems lie.

Black boxes

The main new loci of error, and the
potential sources of accidental or deliberate biases or random errors,
are the two black boxes in the election: the scanners, and the
counting machine.

The standard response to this is use open-source software, and
have the code publicly reviewed. I'm no longer convinced that's
as easy an answer as it sounds. As anyone will know, who's had to
manage a network, printers and scanners are swines, always
failing in the most irritating ways at the most inconvenient times,
and it would be decidedly non-trivial to integrate the various
required bits of hardware and software into a complete system that
would work at full capacity from precisely 22.00 on the appointed day,
with minimal opportunities for testing, while everyone was watching.
I wouldn't want that job.

OK, then, lets leave the messy system integration to contractors,
and have the important bits of publicly assured software running in
assured hardware, with the Returning Officer formally responsible for
booting them from assured media..., and so on. But I don't really see
this working either, as it points towards a more complicated protocol,
and more, and more disparate, bits of hardware, increasing the
complexity of the systems integration problem, and so directly
decreasing its reliability.

The only way out of this, I think, is by
public testing of the live system. Let the contractors
implement the system however they like, subject only to the
requirement that they emit machine-readable logging information at
appropriate points, in particular the system's decision about each and
every ballot paper. This would mean that:

Ballot boxes could be chosen by the Returning Officers at random
while the count was going on, counted by hand, and compared with the
set of decisions logged by the system. It would be a straightforward
statistical calculation to work out how many boxes would have to be
sampled in order to detect a given level of random or systematic error.

The ballot paper data can then be counted by multiple
independent algorithms, and compared with the contractor's
calculation of the overall contest result. This data could be made
public (there was a Scottish Executive consultation on this; I
don't know the outcome, nor really understand why this should be a
major problem), or if that is problematic for some reason, the
calculation verified by some manifestly public mechanism either during
the count, or soon enough after it to support a challenge.

Conclusions and recommendations

These are the summary conclusions I came to at the end of my
report. The paragraph and section references are to the report.

1. Overall

The count I observed in Glasgow was very
successful, from the point of view of both the ballot-handling
protocol and the mechanical aspects of the counting technology. There
have been media reports of technical problems at some other
counts.

...all very much to my surprise. I don't know whether Glasgow was
lucky or particularly well-organised, but the Glasgow count seemed to
go very smoothly, with results appearing within spitting distance of
the LA's four-hour estimate. I know there were software and hardware
problems in other (Scottish) centres, so the system contains at least
scope for disasters, but can also manifestly be made to work.

2. Problems

The election as a whole was marred by a number
of problems. The design of the ballot papers seems to have caused a
large number of spoiled votes (Sects. 3.3 and 4), and there was a
pre-election controversy about the handling of the large number of
postal ballots.

The design of the ballot papers does appear to have been botched,
not least by using a market-research consultancy to evaluate the
designs, rather than usability experts. However, I'm just concerned
with the e-counting, here.

3. Protocol

The protocol for the handling of ballot papers
seems at least as secure as the traditional system (para. 35 and
following, TOR 1), with similar tradeoffs between security and
usability. There seems to be a claim that the format of the UIMs
helps prevent forgery (para. 50); this should be examined in more
detail than is available here (TOR 2).

The process of handling hundreds of thousands of loose sheets of
paper, getting them into the one room, and not losing too many of them
on the way is a lot harder than you might expect. In retrospect this
is pretty obvious, but it struck me pretty forcibly when I realised
that much of the drill in the polling stations and at the count hall
was pretty similar in outline to a traditional election, even though
most of the details were substantially different.

For example, at one point in the protocol, clerks compare the
number of papers in a ballot box as counted at the polling station,
with the number counted by the scanner, and if the difference is in
the range +1 to -3, then that's passed as close enough. That seems
odd, but it turns out that this range is used because that's the
margin of error that's been found, in the past, with traditional
elections, to be consistently achievable.

Now, it might be the case that this error budget could be
reexamined now that scanners are in the mix, but even without that,
this range is useful for two reasons. Firstly, it reminds us that
traditional elections did in fact have an error budget,
estimated by experienced personnel, and secondly that this budget
(which, at approximately ±2 in a box containing around 400 papers,
comes out at ±0.5%) sets a rough scale for the various processes
involved in the electronic count. The overall error in the result
would be a combination of a few error sources, but if they're all
around this scale, then the overall error wouldn't be massively
greater than this.

It's probably also worth emphasising that in a context like this,
small systematic errors (that is, bias) are more important
than larger random errors. The smallest winning margin in the
parliament elections was Cunningham North at 0.2%, but the next lowest
was 1.6%, so the process as a whole could probably withstand a
systematic error of order 1% without serious difficulty. Since there
are multiple contests, the process could withstand a random error
substantially larger than this without changing the overall
Scotland-wide result.

4. Scanning

The scanning and OCR technology used appears to
be conservative, increasing confidence that it has a low error rate
(para. 82, para. 85, Sect. 5.2). Efforts have been made to assure the
algorithmic correctness of the counting program (para. 110) (TOR
2).

I ended up persuading myself that the scanners were probably
conservative, though there are arguments in the ORG report that this
may be optimistic.

5. Systems

The support systems for the count appear
well-designed in general, but there are some minor opportunities for
improvements (para. 48).

6. Risk to Secrecy

There is no obvious additional risk to the
secrecy of the ballot resulting from the electronic counting (TOR 3),
other than whatever risk is associated with the retention of ballot
data discussed in para. 113.

Note, just in case it needs saying, that this is a very different
thing from suggesting that there is no problem from electronic
voting, which is a very different nest of problems indeed.

7. Necessity for validation

The principal difference between this and a
traditional election is the presence of software, and the fundamental
difficulties of observing this, closely enough to detect accidental or
deliberate malfunctions (TOR 1 and 2). It is absolutely necessary to
make manifest the integrity of the system, and a possible and
inexpensive means of doing this, by releasing intermediate steps in
the count, is discussed in Sect. 5.5.

I believe this validation can be more effectively done by testing
outputs, rather than by formally validating the hardware and software
involved.