Statistics That Lead Us Astray

Benjamin Disraeli famously said “There are three kinds of lies: lies, damned lies and statistics.” As Disraeli observed, many people, understanding the authority conferred by statistics, use them dishonestly to push an agenda. This behavior is routinely seen in marketing and in the political realm. But there is another unintentional misuse of statistics that is just as common in the workplace.

Within most conventional corporate environments, the average knowledge worker has a limited background in the study of statistics. Even in IT organizations, with many technical individuals, a long forgotten stat course is the limit of statistical training. Scientists involved in research have long recognized the importance of applying sound statistical techniques to the analysis of experimental data. Without these techniques, it’s easy to draw inappropriate conclusions. Yet in the corporate workplace conclusions are routinely drawn from data without appropriate statistical controls.

Typical corporate examples where data is analyzed are workplace surveys, employee performance, financial performance and system reliability. Rarely is this data analyzed in a way that is statistically sound. Common mistakes include insufficient sample sizes, confusing correlation with causation and disregarding randomness. In future posts, I’ll examine each of these mistakes. For this post, I’m going to explore a fascinating concept known as Simpson’s Paradox.

Simpson’s Paradox is an effect that explains a particular misinterpretation of statistical data. While it is well known to statisticians, it is mostly unknown by the general public. It was first described in an academic paper in 1951 by the British statistician Edward Simpson. Simpson’s Paradox occurs when the relationship between two variables is reversed by the introduction of a third, hidden variable. A good way to understand Simpson’s Paradox is through real world examples.

Almost 50 years ago, Congress passed the landmark Civil Rights Act of 1964. An analysis of party voting patterns shows a good example of Simpson’s Paradox. Eighty percent of Republican senators voted in favor of the Act compared to just 61% for Democrats. This despite the conventional notion that Democrats are more supportive of civil rights legislation. When a third lurking variable is added we see Simpson’s Paradox at work. If the voting patterns are further broken down by region (North vs. South) a different pattern emerges. Northern Democrats were more supportive of the Act than their Republican counterparts (94% vs. 85%). Similarly, Southern Democrats were more supportive (7% vs. 0%).

As shown in the chart below, region was a more predictive variable for voting preference than party affiliation. Because a much greater number of Democratic senators were from the south (94 vs. 10) combining regions distorted the total outcome.

Let’s take another example, this time hypothetical, from the medical world. Let’s say that two different types of treatment are being used against tumors. After an extensive study, it’s found that Treatment A is superior to Treatment B with relative success rates of 85% to 79%. The immediate conclusion would be to start using Treatment A as the default protocol. However, upon further examination, adding a third variable (tumor size) reverses the rates. The table below shows a similar pattern to the voting example above.

When treating small tumors, Treatment B had a higher success rate of 94% vs. 92% for Treatment A. Similarly, when treating large tumors, Treatment B was also superior, 74% to 71%. However, because a much higher proportion of patients with small tumors (i.e easier cases) received Treatment A, it appeared to be the overall best treatment.

Now let’s turn our attention to the world of enterprise IT. A classic metric analyzed for Help Desks is call closure rates. Many agents or teams are rated based on their ability to close a call (i.e. solve the customer’s problem) without involving higher-level personnel. Let’s pretend we have two teams, Team Alpha and Team Bravo. Upon examing last month’s closure report, Team Alpha has outperformed Team Bravo 85% to 79%. Our logical conclusion is that they are the more effective team. But hmmm, where have we seen this act before? You guessed it, a third, lurking variable is at play. Team Alpha works Thursday through Saturday. Team Bravo, works the more difficult Monday through Wednesday shift. Monday tends to be a time of high volume, with many difficult calls stemming from weekend change activity. Therefore, lets looks at a breakdown of performance with call severity included. Let’s say that calls are simply classified easy or hard. The following table tells a different story.

Like our previous two examples, adding the third factor (call difficulty) resulted in a reversed conclusion. Team Bravo is superior when handling “easy” calls as well as the “difficult” ones. It is only because they handle many more difficult calls that their overall success rate is less.

So be careful when analyzing statistical data. Always consider that there may be a third variable at play that creates a misleading conclusion. Simpson’s Paradox shows the importance of the old idea of an “apples to apples” comparison.