Statistical Proof

In order to show that certain judges had a bias for Adelina vs Yuna, I looked at the average, standard deviations and z scores for their marks they received in the free skate. The z score is calculated by the sample-mean/standard deviation and is a measure for how far off the average a certain judge was. The first thing to note is that the standard deviation for Adelina is higher which is suspicious. The difficult in determining inflating scores is that judges are not identified. A judge should remain consistent throughout his or her judging and should not vary from one program; however, inherently certain judges are tougher judges and some are easier. If we assume that Yuna was judged fairly we can establish a baseline z-score to determine how "tough" a judge is relative to the other judges. If the judges were fair and remained consistent they should relative to other judges remain just as "tough". If a judge has a low z score that demonstrates the judge is just a more difficult judge, not necessarily that he was penalizing a certain skater. However, when one calculates the Z scores for Adeline, there five judges that were significantly less tough on Adelina, one judge in particular showed a 604% jump. Four judges demonstrated a 100% or higher increase while the other five were relatively consistent. The total sum of increase for Adelina was 1078% in terms of "ease of judging". It is in fact interesting that the judges who demonstrated the largest jumps were the judges who seemed to be right around the average. The fix was intelligent as they did not grade on either extreme. The numbers never lie.

I'm kind of curious about this. If it's not too much work, what about calculating z scores for other skaters like Asada to see the spread in them? To establish a firmer baseline for this.

Additionally, from elsewhere I've read that the judge's evaluations are actually randomized not just by shuffling the columns, but also by shuffling the columns in each individual row. Thus, you wouldn't be able to say that (for example) "Judge 7 scored this" because the 7th column would actually correspond to different judges' scores. I don't know if they're right or not on this, but would there be some statistical test to determine this? Although admittedly it seems a bit statistically unlikely for randomly shuffled numbers to end up with 1 column having mostly 3's...