Thursday, March 22, 2012

Electoral Fraud and the Russian Presidential Election - Part 2

In the previous post, I examined some of the more basic graphical indicators of electoral fraud in the Russian presidential election.

How else can election data be analyzed for evidence of fraud? One of the most common approaches is to study the distribution of a particular digit in the results. The Guardian posted a brief article that evaluated the first digits of electoral returns using Benford's Law, which posits that numbers arising from certain natural processes will have leading digits that are distributed logarithmically (1 is more common as a leading digit than 9). While the election results for Putin don't appear to conform to Benford's law, it is unlikely that the law is a relevant metric of voter fraud. Since precinct or region-level returns do not encompass enough orders of magnitude, the method is a poor indicator. However, Walter Mebane has done extensive work applying Benford's law to the distributions of the second digit in electoral data and this method may be more fruitful in detecting fraud in Russia.

Conversely, one can examine the last digits of the election returns. Bernd Beber and Alexandra Scacco used such an approach to reveal likely electoral malfeasance in Nigeria and in the 2009 Iranian presidential election. They posit that, in a "clean" election, the final digit in the raw vote or turnout counts at the precinct-level should be uniformly distributed. Since a single vote is inconsequential in deciding an electoral outcome, the last digit is essentially an error term (the full, more complicated, proof is in the first link above). However, if electoral results are tampered with and the result sheets are being filled in arbitrarily, the distribution of last digits may deviate from uniformity. This is because humans tend to be terrible at generating truly random sequences of numbers. Beber and Scacco cite a number of studies of that suggest cognitive biases toward smaller over larger numbers, avoidance of repetitive sequences (like 333), and preference for adjacent numbers. Comparing the results of the Swedish parliamentary elections to electoral data from Nigeria's Plateau state, they find strong uniformity in the former and significant deviations in the latter.

I applied Beber and Scacco's method to election returns from Russia, looking specifically at the last digit distribution in reported numbers of registered voters at the precinct and district levels. If election officials are not out-right fabricating candidates' vote totals, but instead votes are being inflated via ballot-stuffing, then by necessity, registered voter counts still would need to be altered slightly in order to accommodate these “artificial” ballots. In order to avoid impossible and embarrassing reports of greater than 100% turnout, some fudging of the numbers might be needed.

A quick aside on terminology/method. The Russian election commission reports results aggregated at three levels – the republic/province level (equivalent to states), the “sub-republic” level (essentially, city/county subdivisions in each province) and at the precinct level (with data from each local polling center or uchastkovaya izbiratel'naya komissia (UIK)). I use the “sub-republic” data for the Russia-wide test and precinct level data in testing individual provinces. I also exclude any turnout figure that has less than 3 digits to ensure that the last digit is sufficiently "irrelevant".

First, at the national level, there is some evidence that the last digits for registered voter counts do not follow a uniform distribution. The graph below shows that the data contain significantly more 2s than expected (outside the 95% confidence bound). Additionally, a chi-squared test returns a p-value of .029, suggesting statistically significant (at alpha = .05) deviation from uniformity.

Is there variation across regions? Anecdotal evidence suggests so. The most egregious reports of fraud tend to come from “peripheral” regions, particularly Chechnya and Dagestan which consistently report absurd levels of support for Putin/Medvedev and United Russia. Reports from Moscow and St. Petersburg (the centers of the protest movement) tend to be more subdued. Indeed, Moscow City was the only region where Putin was only able to obtain a plurality of the vote as opposed to a majority.

Conducting the last-digit test on registered voting data from each polling-place in these four regions seems to confirm that fraud levels vary significantly within Russia. The graphs below suggest that neither Moscow nor St. Petersburg show any significant deviation from uniformity. Chi-squared tests for both regions are also not statistically significant.

Chechnya, where Putin received 99% of the total vote, and Dagestan, where Putin's numbers were slightly lower (93%), tell a different story. Both show dramatic deviation from uniformity with a tendency to emphasize lower numbers, particularly zero and five. Chi-squared tests for both are also significant at the 1% level.

Obviously this is very cursory analysis, but it does suggest that the last-digit method is a pretty good tool for finding hints of fraud in raw election returns. Any thoughts?

Thanks to Bernd Beber for making available the R code for running the last-digit tests and generating the graphs.