Today is Election Day in the United States, so we are resurfacing this story on auditing election results that originally ran in 2012.

NAPA, CALIFORNIA—Armed with a set of 10-sided dice (we’ll get to those in a moment), an online Web tool, and a stack of hundreds of ballots, University of California-Berkeley statistics professor Philip Stark spent last Friday unleashing both science and technology upon a recent California election. He wanted to answer a very simple question—had the vote counting produced the proper result?—and he had developed a stats-based system to find out.

On June 2, 6,573 citizens went to the polls in Napa County and cast primary ballots for supervisor of the 2nd District in one of California’s most famous wine-producing regions, on the northern edge of the San Francisco Bay Area. The three candidates—Juliana Inman, Mark van Gorder, and Mark Luce—would all have liked to come in first, but they really didn’t want to be third. That’s because only the two top vote-getters in the primary would proceed to the runoff election in November; number three was out.

Napa County officials announced the official results a few days later: Luce, the incumbent, took in 2,806 votes, van Gorder got 1,911 votes, and Inman received 1,856 votes—a difference between second and third place of just 55 votes. Given the close result, even a small number of counting errors could have swung the election.

Vote counting can go wrong in any number of ways, and even the auditing processes designed to ensure the integrity of close races can be a mess (did someone say “hanging, dimpled, or pregnant chads“?). Measuring human intent at the ballot box can be tricky. To take just one example, in California, many ballots are cast by completing an arrow, which is then optically read. While voters are instructed to fully complete the thickness of the arrow, in practice some only draw a line. The vote tabulation systems used by counties sometimes do not always count those as votes.

So Napa County invited Philip Stark to look more closely at their results. Stark has been on a four-year mission to encourage more elections officials to use statistical tools to ensure that the announced victor has indeed won. He first described his method back in 2008, in a paper called “Conservative statistical post-election audits,” but he generally uses a catchier name for the process: “risk-limiting auditing.”

Napa County had no reason to believe that the results in this particular election were wrong, explained John Tuteur, the county assessor, when I showed up to watch. But, anticipating that the election would be close, Tuteur had asked that Napa County be the latest participant in a state-sponsored pilot project to audit various elections across the Golden State.

While American public policy, particularly since the 2000 Bush v. Gore debacle, has focused on voting technology, not as much attention has been paid to vote audits. If things continue to move forward, Stark could have an outsized effect on how election audits are conducted in California, and perhaps the country, for years to come.

“What this new auditing method does is count enough to have high confidence that [a full recount] wouldn’t change the answer,” Stark explained to me. “You can think of this as an intelligent recount. It stops as soon as it becomes clear that it’s pointless to continue. It gives stronger evidence that the outcome is right.”

Audit day

To kick off the process, all 6,573 votes tallied in the 2nd District supervisor contest were re-scanned by county elections officials in the city of Napa. They sent the scans to a separate computer science team at Berkeley, led by professor David Wagner. Along with a group of graduate students, Wagner has developed software meant to read voter intent from ballots. His system, for instance, will flag even ballots where the arrow was not filled in according to the instructions, and it takes a different approach to filtering out stray marks. The Wagner team created a spreadsheet containing each ballot (they also created a numbering system to identify and locate individual ballots) and how that person cast his or her vote.

One problem that cropped up early on was the discrepancy between the number of ballots cast and the number of ballots scanned. While 6,573 total votes were recorded in this particular contest, the Wagner team scanned a total of 6,809 ballots, while Napa County recorded 7,116 votes cast in the election as a whole. (Not every voter in the election chose to vote in this particular contest.) In short, there were more than 300 ballots missing. While that seems problematic, the margins stayed more or less the same.

“If both systems say ‘Abraham Lincoln won’ then if the unofficial system is right, so is the official system, even if their total votes differ and even if they interpreted every vote differently,” wrote Stark in an e-mail on Tuesday. “That’s the transitive idea. A transitive audit is really only checking who won, not checking whether the official voting system counted any particular ballot correctly. That said, we do compare the precinct totals for the two systems to make sure they (approximately) agree, which they did here.”

He added that to deal with the missing ballots, to confirm the winner, he treated them as if they were votes for the runner-up—so even with 300 additional votes, Luce still was the victor.

“To confirm the runner-up, we could not do that; instead, I treated them two different ways, neither completely rigorous,” he added. “In other audits, I’ve been able to deal with any mismatches between the ballot counts completely rigorously, so that the chance of a full hand count if the reported result was wrong remained over 90 percent.”

With that out of the way, the first step in the actual audit was to randomly select a seed number that would be used to feed a pseudo-random number generator found on a website that Stark created. For this, Stark had some high-level help in the form of Ron Rivest, one of America’s foremost experts on cryptography and voting systems, a professor of computer science at MIT who had also helped create the RSA crypto algorithm. Using 20 store-bought ten-sided dice, Rivest and Stark rolled out a 20-digit number. (73567556725160627585, for those keeping score at home.)

Risk-limiting auditing relies on a published statistical formula, based on an accepted risk limit and on the margin of victory to determine how many randomly selected ballots should be manually checked.

“The risk limit is not the chance that the outcome (after auditing) is wrong,” Stark wrote in a paper (PDF) published in March 2012. “A risk-limiting audit amends the outcome if and only if it leads to a full hand tally that disagrees with the original outcome. Hence, a risk-limiting audit cannot harm correct outcomes. But if the original outcome is wrong, there is a chance the audit will not correct it. The risk limit is the largest such chance. If the risk limit is 10 percent and the outcome is wrong, there is at most a 10 percent chance (and typically much less) that the audit will not correct the outcome—at least a 90 percent chance (and typically much more) that the audit will correct the outcome.”

To decide how many ballots should be sampled in the Napa County audit, Stark used his own online tools and calculated that it should be 559. With that number in hand, Napa County’s John Tuteur supervised a team of temporary ballot counters in another room. They sorted through stacks of ballots in numbered boxes, affixing a sticky note to the individual ballots in question, preserving the order in which all ballots were kept.

After locating the individual ballots, the team delivered the boxes containing them back to Stark, Rivest, and a few observers (including me). Each marked ballot was then pulled from its box and displayed to the room. Once everyone agreed that the ballot showed a vote for a particular candidate, an undervote (e.g., no vote at all), or an overvote (an uncounted and unauthorized vote for multiple candidates), the result was tallied on Wagner’s spreadsheet. After a given set of ballots, those results were then compared to what the Wagner image-scanning team had recorded.

“You want cast as intended, and counted as cast, and verified,” Stark said.