Note 4: Added heterogenous results showing CHP being penalized by higher invalid shares of ballots much more in above-median pro-CHP districts than in below-median pro-CHP districts.

Having seen tweets on numerous alleged voting irregularities in Turkey and thanks to Twitter user @erenyanik I came across this CHP/STS dataset of voting data in the Greater Municipality of Ankara, one of the tightly contested (less than a percentage point in the vote share) mayor elections between Melih Gökçek and Mansur Yavaş. The dataset includes 12,230 ballot boxes across 1,682 voting locations in 25 districts in Ankara. I didn’t collect the data itself and therefore this analysis should be taken as highly preliminary.

It all started when @erenyanik posted this picture plotting ballot box level AKP and CHP vote shares against the turnout rate for the Ankara mayor race. Several of the ballot boxes revealed turnout rates above 100 percent, which is strange, but also that these tended to systematically favor the AKP. I decided to create a couple of graphs myself, and per request am now typing this very basic analysis up. Instead of showing both AKP and CHP vote shares like @erenyanik does, I show the difference between these, the AKP-CHP win margin. In the graph below is also a superimposed vanilla local polynomial (lpolyci for STATA users) with accompanying 95 percent confidence intervals:

(Note: A similar graph using the AKP vote share instead of the AKP-CHP vote margin can be found here.)

Each plot represents a ballot box. The graph shows somewhat of a negative correlation between turnout and AKP-CHP win margin until turnout is above 100 percent. In places where turnout is recorded as above 100 percent, the votes are clearly in AKP’s favor. One reason for the negative relationship between turnout and relative voting for AKP could be because areas that vote more for AKP than CHP tend be those with lower turnover rates for various reasons.

This looks interesting but does not necessarily constitute an irregularity. What is a bit strange though is the ballot boxes with more than 100 percent turnout and its association with a lean towards the AKP. This could be a sign of blatant ballot stuffing of AKP votes and is therefore noteworthy, but the amount of ballot boxes is rather small and unlikely to have any meaningful effect on the end result. (It is for example too small to account for the vote margin, roughly 30,000 votes at the time of writing this). As such, this is either a mistake in the data, or a real irregularity but with limited implications. Moreover, several of these boxes, but not all, tend to hold small amounts of ballots and could thus climb up above 100 percent as e.g. non-registered election officials cast their votes. A reason why these tend to have systematically higher vote shares for the AKP may be because these are also less-populated places, villages, and perhaps also more socially conservative and poorer – characteristics making them more likely to vote for the AKP.

Instead, I plotted the AKP’s share against the share of all ballots declared as invalid. In contrast to a focus on a the small number of abnormally high turnout ballot boxes, nearly all ballot boxes will have some invalid votes, and so any manipulation here would have much larger implications. The intuition is that instead of clumsily adding a bunch of ballots for the AKP, ballots for the CHP could be selectively assigned as invalid in ways that could signal manipulation. The graph looks like this: And for Istanbul: (Note: A similar graph using the AKP vote share instead of the AKP-CHP vote margin can be found here.)

The interesting part of this graph is the positive relationship between the AKP-CHP vote share difference and the invalid share of ballots. Places where a higher fraction of ballots are declared invalid have systematically higher votes for AKP relative to those for the CHP.

A naive conclusion would be that this is definite evidence of voter irregularities, i.e. that AKP (relative to CHP) vote shares are higher because CHP votes get thrown out as invalid. It’s naive because there are a range of factors that are omitted in this analysis, and thus the above relationship is likely subject to an endogeneity problem. For example, constituencies voting to a higher degree for AKP instead of CHP could also be poorer, less educated, and therefore more likely to make mistakes (resulting in more invalid ballots). An unconditional correlation between the AKP-CHP vote margin and the invalid share of ballots may suffer from an upward bias.

In the absence of data on relevant data on education and income etc, here’s a very basic way to account for this: add fixed effects. Since the dataset includes variables for A) Districts (ilce) and B) Location (or voting stations, alan), I can regress the AKP-CHP vote margin on the invalid share of ballots while controlling for fixed effects related to either A or B. This account for factors that vary across, but not within, districts (or locations). Below are results from regressing the AKP vote share (akpchpdiff) on the invalid share of ballots (invshr), using the following three specifications: column 1 reports results from a simple unconditional regression of akpchpdiff on invshr, whereas columns 2 and 3 add district (ilce) and alan (voting station) fixed effects respectively. In all the specifications, the standard errors are clustered at the corresponding fixed effects level. I show the results for both Ankara and Istanbul.

The coefficient 5.9 (for Ankara) means that a 1 percentage point increase in the share of invalid ballots is associated with a 5.9 percentage point increase in the AKP-CHP vote margin. As stated above, there is little reason to put much trust into this coefficient, because there could be other factors driving this correlation. Controlling for ilce-level fixed effects in column 2 results in estimates that are also quite large. The fact that it is smaller is consistent with the possibility that estimates in column 1 are biased upwards due to omitted variable. The estimate now measures how the AKP-CHP win margin changes with the invalid share of ballots when all factors that are constant within districts are controlled for. In other words, it compares variation within districts. If there is a lot of variation within districts with regards to omitted factors like education and income, this estimate could still be biased.

Adding alan-specific fixed effects in column 3 is a rather demanding action. It controls for all factors that vary across locations (i.e. voting stations), leaving only the remaining variation across ballot boxes within locations. It is much more difficult to argue that there would be systematic differences in omitted factors across ballots within a voting station than in previous specifications. It is thus noteworthy that the estimate, albeit much smaller, is still statistically significant. The fact that the estimates change across specifications may be an indication of the omitted variables confounding the analysis in the unconditional specification.

Although these fixed effects are no solution to the endogeneity problem, it is quite striking that the results are this robust. This means that even within a particular voting station where votes are cast – like a school etc – ballot boxes with more invalid votes tend to have an AKP bias.

An equivalent way to show these results is to run the regressions using invividual party vote shares instead of the AKP-CHP win margin. Below, in graphs for Ankara and Istanbul, I run separate regressions for each party, regressing its vote share on the invalid share of ballots with voting station fixed effects.

For Ankara:

and for Istanbul:

Both these graphs rather strikingly show how a larger shares, even within a specific voting station, appears to penalize the CHP to the benefit of AKP.

This raises the question whether we might observe differential results in ares where CHP receives a lot of votes versus areas where they don’t. If we see that the CHP loses more votes as a result of ballots declared invalid in more pro-CHP districts than in more pro-AKP district, this could be a further sign that something is fishy.

What I do below is to calculate the district level CHP vote share. I then split the sample at the median of this vote share, creating one below-median sample (i.e. ballot boxes in districts supporting the AKP more) and another above-median sample (i.e. ballot boxes in districts supporting the CHP more). I then, for each of these sample, regress the AKP-CHP win margin on the invalid share of ballots.

For Ankara:

and for Istanbul:

What is noteworthy here, and especially for Ankara, is how the coefficient on invalid share of ballots is much higher in pro-CHP districts (column 2) than in pro-AKP districts (column 1). It is mostly in the former districts where a higher share of invalid ballots seem to particularly penalize the CHP’s vote share to the benefit of the AKP.

Suppose for a moment that what I have documented here is actually voting manipulation, that ballots for the CHP get systematically thrown out. Then, in order to actually affect the election outcome, one would particularly want to do this in places where the CHP receives a lot of votes. This heterogenous result above would thus be consistent with this.

Indeed, if this is a sign of voting manipulation, it is much more subtle than messing around with specific ballot boxes and would allow manipulation that may very well be difficult to spot in specific boxes. If just a few ballots are declared wrongfully invalid in a lot of boxes, this could add up to a lot of votes and have a real impact on the election outcome.

However, at the moment this remains a statistical anomaly, which could potentially have another explanation not involving voting manipulation. As such this is not definite proof of extensive voting irregularities, but I think it is enough to warrant a more detailed investigation of Turkey’s local elections, at least in Ankara.

“This could be a sign of blatant ballot stuffing of AKP votes and is therefore noteworthy, but the amount of ballot boxes is rather small and unlikely to have any meaningful effect on the end result. (It is for example too small to account for the vote margin, roughly 30,000 votes at the time of writing this). As such, this is either a mistake in the data, or a real irregularity but with limited implications.”

Note that most of the fraud was probably committed in a way that doesn’t result in >100% voter turnout. That’s a particularly stupid mistake. One trick that we know was used was to list large numbers of “eligible voters” in non-existent buildings or unoccupied apartments in the neighborhood, and let fraudulent groups of voters cast votes at multiple ballot boxes. This method does not result in a >100% turnout, although depending on how centrally the scheme was planned and implemented, it may inflate the number of eligible voters per ballot box.

The voter registers are placed at “muhtarlik”s (elected local representatives) for a certain period to check for any irregularities. Individuals or political parties can object to these registers. The Higher Election Council finalizes the voter registers and then makes this data available to all political parties. If you claim there was fraud on voter registers, then you must provide proof for that. Every Turkish citizen has a 10-digit ID number (TC Kimlik No). What are the ID numbers of those people who were fraudulently registered in non-existent buildings or unoccupied apartments? Were these people registered as voters at other addresses? It is possible to answer all these questions by looking at publicly available date. If some people were registered in such a fraudulent way, then they must also come to a ballot box on election day, to actually commit voting fraud. Four opposition parties are empowered to have an official representative at every ballot box. If a person who did not possess an official ID that carries the 10-digit citizen ID tried to cast a vote, why did not the opposition party members of the ballot box prevent him from doing so?

So far, I have not seen any convincing evidence that systematic voting fraud was committed this way. The burden of proof falls on those making these allegations.

More than 40 voters were found to be registered in an empty apartment in Bakirkoy, Istanbul. This was noticed by a registered voter from the same building. When he objected, ballot box officials said they didn’t have the authority to decline a registered voter. Later, it turned out “muhtarlik” (i.e. the lowest-level administrative division in Turkey) had already filed a police report about it and many other similar incidents in that neighborhood.

This is basically a multiple-voting scheme. The ID card requirement does not preclude it. A voter registered at multiple different ballot boxes can simply present their ID at each box and cast their vote. There is no electronic system to check whether someone has cast a vote at a different ballot box before. Election ink would, but that method was for some reason not employed in this election.

@Tolga Yilmaz: The Higher Election Council is a high judiciary office. I would expect the voter registers to check for instances where people carrying the same national ID are registered at more than one ballot box. Multiple voting or voting in lieu of someone else is a crime punishable with 3 to 5 years of imprisonment. The report at Hürriyet is just one example. It does not prove any widespread multiple-voting or systematic fraud.

While you may be on to something, note that the number of invalid vote percentage in Ankara is about 4%, and this is not much different (I didn’t check for statistical significance) from the same measure in Izmir, which was won by CHP.

By the way, historically, the percentage of votes declared invalid in past elections needs to be checked as well. For example, general elections of 1991 (way before AKP was even founded) had a 4% of votes declared invalid too. I think this fishing expedition might go nowhere unless someone can show concrete evidence of irregularities.

Hey Erik,
Thanks for the great post. Did you also weight the samples by the ballot sizes?
Also, the intercept term seems to be significant in all models, but its sign keeps changing. Conditioning on “ilce”s suggests that without invalid votes CHP would have won (with a .07% margin), although conditioning on “alan”s gives AKP as the winner. Any comments on this?

I agree, I don’t know if the data is available but, at least the plots would be more interesting with y axes: difference in “number” of votes, instead of difference in vote share. We could then have a better idea about the “size” of the possible fraud. If the ballot boxes with turnout above 1 have high number of votes, the effect could possible be higher, and vice versa.

Thank you Mr. Meyersson for this interesting analysis. Let me, as an amateur, point out the possibility that the stuffing must not have caused more than 100% turnout rate necessarily, on the contrary turnout rate can be lower than 100% although stuffing exists. I see from the graph that the AKP ballots are increasing in line with turnout rates. So the implications of stuffing can be significant.

I believe most irregularities are beyond statistics and statistics just add noise to a political question. Fishing for a single type of fraud is missing the point. Elections are civil war without weapons, numbers obviously matter, but so does organisation etc. The opposition only now starts to raise to the challenge.

A burned vote just does not show up (many cases). A dead candidate can not win (one case, another one shot the day after but alive), but dead voters may still vote. (didn’t someone say even the dead should raise to vote during a previous election) A count without witnesses from different parties is pointless (several reported cases where witnesses were forcibly removed). A count in darkness isn’t witnessed (the cat in the transformator). A stolen ballot box will not be counted. And there is even a party which apparently gets votes mainly from voters of another party (which is frequently forbidden and keeps changing names) misspelling their own, which was “randomly” assigned a spot close to the other one. (BTP BDP) and of course there are many innocent mistakes when processing the results (especially when plausible mistakes such as BTP BDP are already set up, but there are quite a few examples of blatant falsification at this stage going round on twitter) …

But an interior minister in the election council at 4 am (or was it 5?) during a municipal election with the count stopping for hours, that is a smoking gun. If people want to do statistics put timestamps on the results and check Çankaya results before and after that visit. Everything else is noise, that will end up with “we can’t say for sure”. Which is very laudable science, but tells even less than a single eye witness account.

2. It is the civic duty of people living in a locality to check their voter registers and report/object any irregularities they spot in the registers. Furthermore, the political parties have access to these registers and legal avenues of contestation are open for them. If a dead person is recorded in a voter register and if someone shows up on election day carrying a valid ID on his/her name and having his/her likeness that resembles the dead person, there is of course not much that the ballot box committee can do, but we are now entering the realm of fantasy.

3. The political parties have the legal right to have NOT JUST WITNESSES, but MEMBERS IN A BALLOT BOX COMMITTEE besides the witnesses. A ballot box committee has two government employees, one being the chair of the committee. The committee may have 5 more members, one each from the five different political parties which received the largest share of votes in the last election. Some parties may not provide committee members at all locations, but the committee cannot be formed unless four of its pre-assigned members are present. Please provide one case where committee members from political parties, NOT witnesses, were forcibly removed.

4. There are reports that in many of the cases where there were power cuts, the opposition parties received more votes than the ruling party. It must be a strange way to conduct voting fraud that does not benefit the fraudster.

5. This is simply a data entry mistake and I have shown it affecting the ruling party, not just the main opposition party. Check these tweets:

These data entry mistakes does not matter in the final analysis, because the official tallies are calculated at county and province level (“birleştirme tutanakları”) with the presence of witnesses from political parties. Please look at he memorandum no. 136 of the Higher Election Council for details: http://www.ysk.gov.tr/docs/2014MahalliIdareler/Ornek139-2014.pdf

Sevket, thank you for your comments (and for your many tweets as well). Just so we’re clear, your counterargument to what I am showing is that fraud is unlikely because 1) the laws and regulations prevent them and 2) there is anecdotal evidence that you interpret as biases in favor of the opposition. But this doesn’t address what I am showing, which is a systematic bias – in the average Ankara ballot box – towards AKP votes in boxes with higher shares of invalid ballots. This is the case even when we account for factors that vary at the district, as well as the voting station, level.

Burned votes:
A burned vote is either a genuine vote burned before counting (i.e. fraud) or something burned to give the impression of fraud (i.e. a different kind of fraud). If you say, that it is most likely manipulation, you are in fact accusing the opposition or an interested third party of fraud (of the second kind). I.e., in Ceylanpınar (where the allegation is endorsed by the non-AKP party in question) it is either AKP fraud or BDP fraud. Which is it? If you can’t decide, shouldn’t the election be repeated? Even if only for the sake of general peace? Same argument applies elsewhere.
Admittedly, the internet is full of bogus claims, excitement and distrust and likely a good portion of disinformation (always great for debunking or fanning distrust) working together. I have even seen the same picture (from afar) claimed as discarded MHP, CHP votes or as AKP votes – clearly the picture didn’t provide sufficient evidence for either claim. Yet, manipulation is as strong a claim as fraud, it isn’t “does not imply a thing”. Not at all.

Fraud and defence of the ballot:
I tried to emphasise that there are many ways to commit fraud. A good fraudster has fantasy and prepares many ways to affect the results and deploys what is necessary and possible. That the CHP was not up to the task to defend their votes past the ballot box is blatantly obvious, that is why there is the whole improvised checking now. Given the accusations of corruption the CHP regularly fields against the AKP you would suppose they prepare better for elections than that.

Data entry mistakes:
Yes, they happen. Manual data entry always includes mistakes. Not only exchanging two parties for each other, but also the “+10 votes here / -10 votes there” mistakes, that are less plausible, but well. However, this is easy to check and correct, if there is a will to do so. So, is it happening? Why not wait with victory speeches until this is done – or in fact until all votes are counted? Are you saying YSK is not using the numbers in their own system to calculate totals (not even to check the totals arrived at in another way and just enters them for fun and to give opportunity for baseless accusations due to inevitable mistakes)? (Can’t access YSK pages.)

Power cuts:
A blackout offers opportunities for: adding ballots, removing ballots, changing ballots, exchanging ballots. Now, if you are accusing the opposition of orchestrating them, please do so openly. If not, you have to explain power cuts all across Turkey, with a strong bias towards highly contested constituencies, if we believe the reports.

Unanswered questions:
You answered to a lot of questions, how about the crucial ones first? What about the case about ballot boxes over 100% participation going to AKP? What about Çankaya before and after Efkan Âlâ?

The problem is that some ballot box committees have mistakenly included “unused ballots” under the “invalid ballots” section in column no. (14). For these ballot boxes, this immediately and wrongly expands the invalid ballots by about 50-60 ballots. This is a mistake that will be corrected when county and provincial election councils add up all the invalid ballots for their respective jurisdictions. But these mistakes are present at your data.

To illustrate, the number of invalid ballots for these ballot boxes are not correctly reported in your data: Sincan 2061, Y. Mahalle 1436, Keçiören 3193, Keçiören 3314, Sincan 2301.

The number of votes received by each party and independent candidate is correctly reported for these ballot boxes, but not the invalid ballots.

I don’t know how you can arrive at any conclusion by using faulty data.

Interesting analysis, I have seen yet another analysis made by @merenbey earlier resulting in similar concerns about the invalid ballots.
Invalid ballot outliers are available here: http://meren.org/tmp/Ankara-02.pdf
Considering all of the political parties have delegates when the ballot boxes open to be counted, I believe, it is a bit of a challenge to mark some of the ballots invalid to effect the results.
Later on @merenbey updated his findings by mapping the education index of the population in Ankara: http://meren.org/tmp/Ankara-03.pdf

If you would like make use of it, here is the data of education levels for Ankara from Turkish Statistical Agency in English: http://goo.gl/LmsaJ8

Thank you for your comment. I’m somewhat familiar with the ABPRS data. The data you show, which I assume is what @merenbey uses, only varies at the district level. What I am doing when I include district FEs is to control for, not just education or income etc, but all factors that vary at the district level. In a way, it is a much more powerful robustness test, than having a few district-varying controls. And the results holds even at the voting station level.

Actually, the data itself has some problem. The data comes from CHP/STS dataset and there are many number of examples that show CHP officials entered wrong numbers into their “own system”. These are not official results from YSK, so we need to wait official counts for any claim of fraud in the election.

CHP obtain their data from YSK. -> http://sts.chp.org.tr/SonucDetay.aspx?sid=207385 (see the “31.03.2014 tarihli 22:24 zamanlı YSK verileri!” in red)
the whole purpose is to announce YSK data so people can check the official counts in YSK’s system against the signed ballot box count papers.

your turnout analysis relies only on places where total turnout is higher than 100%, a mathematical improbability, which therefore of course is correct, however, incomplete. you should add to this all places where HDP or HEP has more than 5% and CHP has 0%, because a very common method of fraud is to enter CHP’s votes into cells above or below the right place. Those two parties are above and below CHP on entry sheet, and dont have a vote potential anywhere near 5%. They are more like %2 minus, so 3% is a huge safety margin. A turnout graph including this misentry information would show the situation much better imho.

There may be large variance in income levels within an ‘Alan’. In Turkey social groups are not as segregated as in many developing countries. Unless you could account for the variance in income across ballot boxes in ‘Alan’ there will always be a suspicion that all the significant results are due to omitted variable bias. You should be aware that your results are being distributed around as scientific proof of fraud.

The manipulation conclusion depends on this. “It is much more difficult to argue that there would be systematic differences in omitted factors across ballots within a voting station than in previous specifications. ” I don’t find it that hard to argue. I know of lots of places where a group of apartments are socioeconomically different than another group across the street. I am guessing people at the same apartment are more likely to be in the same ballot box compared to people across the street. I think this point needs to be discussed.

for the invalid ballot analysis: if you have the local belediye results for the same ballot box and have the invalid ballot share for that election you could potentially separate 2 alternative explanations. If the share of invalid ballots display a good deal of heterogeneity within the ballot box then you are on to something.

I don’t think that so far you found something.
For the boxes with more AKP vote, and more invalid votes, the answer is simple uneducated people tend to make more mistakes while voting.
For the boxes with more turnout rate, I checked the boxes and the answer is simple. For all but few of these boxes the voters are small. There are some boxes even for 10 voters, if we count the election officials (4 at least), the turnout rate can even reach 140 per cent. And it did.
For the few crowded boxes with high turnout rate, there are some more irregularities. For example for some boxes, most of the votes were classified as irregular, I guess some votes that were not for mayoral elections were casted at these boxes. Don’t ask me how, it is just a guess. Even there are boxes where everything but the voter number seemed irregular, for them a close inspection is necessary, but still the amount of these boxes are rather small.

I want to note Yasin Bahtiyar’s comment above, quoting. “Considering all of the political parties have delegates when the ballot boxes open to be counted, I believe, it is a bit of a challenge to mark some of the ballots invalid to effect the results.”

I agree with this observation. And yet, data presents a significant and striking conclusion, quoting author, “even within a particular voting station where votes are cast – like a school etc – ballot boxes with more invalid votes tend to have an AKP bias”.

If we assume, both of the above premises are correct, It follows that, the bias somehow stems from the above-mentioned group of delegates which manage counting process. Here is one view that may explain part of the bias . At his article, Koray Caliskan explains how the chief of delegates for each ballot box is are assigned (http://www.radikal.com.tr/yazarlar/koray_caliskan/ysk_ihmallerinin_gizli_mantigi-1184821). Ballot box chiefs are teachers appointed by Ministry of Education. Because of union memberships, it is very easy for government to tell which teachers are their supporters (there is a clear distinction on political views on these unions). Hence government ends up appointing its supporters as ballot box chiefs extensively. These chiefs may introduce the bias, as they may have more influence on disagreements than other members of the committee due to their designation as chief.

Another finding I found interesting is, quoting author “What is noteworthy here, and especially for Ankara, is how the coefficient on invalid share of ballots is much higher in pro-CHP districts than in pro-AKP districts”. I may only explain this with the ferocity of AKP ballot box committee members, backed by above-mentioned ballot box chiefs. I guess, after this election, opposition will/should be as much ferocious during vote counting process as their AKP counterparts, as none believed before that this much could have changed while counting the votes, instead of by voting itself.

Reblogged this on istanbul dispatches and commented:
As the tweet button says on Swedish Assistant Professor Erik Meyersson’s piece, over 2000 tweets. Why? Because he waded in on the 1st of April to the widespread debate of ballot rigging by the ruling AKP in the 30 March local elections. No credible observer thought the AKP would ever lose these elections. For starters, all the preceding surveys put them well ahead of their nearest rivals, the CHP. That said, the same surveys put the race for mayor of İstanbul (Turkey’s largest city) & Ankara (the seat of power) as tight. Factor in notorious whistleblower tweep Fuat Avni claiming Erdoğan didn’t care about losing any city, except İstanbul, and there comes the motive for what Meyersson takes careful pains to analyse: ballot fraud in these two symbolic cities. For context, Erdoğan claimed victory with about 70 percent of the votes counted shortly before midnight. In Ankara in particular, districts typically loyal to the CHP were some of the last to be registered. In other words, any claims of irregularities could only be found during the final moments of the count. With “victory” already called for incumbent Ankara Mayor Melih Gökçek, any legal challenge would then be left to the courts. At the time of writing, the Ankara Election Board have rejected CHP’s call for a recount. Their own figures put them ahead, and their final appeal to the Supreme Election Board (or YSK) for a recount will be decided on 5 April. Meanwhile, here’s Asst. Prof. Meyersson’s detailed overhaul of the count in those two cities. (Short read: there’s enough doubt to warrant a recount.)

My question is in deed a technical one. How were you able to regress the ‘AKP-CHP vote margin’ on ‘the invalid share of ballots’ while controlling for the fixed effect of the variable Alan (location)? This variable is a categorical one (as opposed to the linear) with more than one-thousand levels, and a basic regression analysis aimed at controlling for the effects of this one seems improper to me.