Fairness Turned Upside-Down

For some time, I’ve been concerned that the fairness statistics I report are harder to interpret than I’d like. The new version of fairness (B) in particular has made the problem more apparent to me, and so I’m finally ready to make a change that I should probably have made long ago.

The difficulty is that the measures I’ve defined really measure unfairness more directly than they measure fairness. I define fairness (C) by adding up the instances of a less skillful player being rewarded in preference to the more skillful player. Fairness (B) is determined by adding up the inequalities in the result of similarly-placed competitors. In both cases, a higher number meant less fairness, not more.

To date, I’ve been flipping this around by then taking a reciprocal, and that makes higher numbers good and lower numbers bad. But it also has a tendency to scale the numbers in ways that make them hard to interpret. For the new fairness (B) measure, the difference between a score of 3 and a score of 4 is quite significant – a format scoring 3 is much less fair than one scoring 4. But the difference between a fairness (B) score of 70 and one of 100 is pretty trivial, nothing that the designer should worry about.

For this reason, I’m getting rid of the reciprocals. Henceforth, lower numbers are good, and higher numbers are bad.

Fairness (C) is has been defined as the sum of the differences between the ideally fair payout and the actual payout. Then I added 0.01 (to prevent the result from becoming infinite when the payout was, in fact optimal), and took the reciprocal. The new version simply skips the last steps – no adding 0.01, and no reciprocal. But, because a lot of people find it difficult to care about numbers smaller than one, I’ll multiply by 100.

Similarly, fairness (B) has been defined starting with the standard deviation of the normalized expectations. Originally, I added 0.01 before taking the reciprocal, but I found that the 0.01 was distorting the calculation too much where the standard deviation was close to zero, so I decided to leave it out, and to deal with the problem of tiny standard deviations by simply reporting any value higher than 100 as 100. Now I’m simply going to report the standard deviation, but again I’ll multiply by 100 so that the results scale to the sort of numbers that most people have more experience with.

This will, of course, sow confusion, at least for a while until I can get back and recalculate the various fairness metrics for the work that I’ve already reported. As a temporary way to make things a bit clearer, I’ll adopt a new notation convention. When identified as fairness (C) and fairness (B) or (b), I’ll continue to spell out the word. But when referring to the new, un-inverted forms of the statistic, I’ll call them simply f(C), f(B), or f(b). And I’ll try to be good about reminding readers that statistics reported in this form a upside-down from the old ones, so that higher numbers are bad, and lower numbers are good.