Statistical Skepticism: Why Data Often Lies to Digital Marketers

I love statistics. I love data analysis. I love graphs. It’s a sickness, but it’s one shared by many marketers — analytic information is really the only information we have to determine whether our strategies are working. Unfortunately, digital marketing and social media statistics are not peer-reviewed or published in academic journals; most of the ‘studies’ we have come from small sample sizes delivered by who-knows-who. That doesn’t mean the data is worthless, it simply means that we need to be skeptical and avoid making any assumptions. Data doesn’t always tell the truth. Quite frequently, data lies.

Using Words to Recontextualize Ambiguous Data

Every marketer knows that words are powerful. Words are more powerful than numbers, even though numbers may be more honest. But unfortunately, words can also be used to mislead. Take, for example, the above infographic from KISSmetrics. One could easily surmise that unique views always go up by posts per day, because that’s exactly what they’re telling you. But that’s not what this graph is saying at all. In fact, without more data, this graph is essentially useless.

If it’s a graph of separate sites, plotted by how much they post and how much traffic they get, it only shows something very alarming: that posting between 23 to 30 posts a day could garner you only as much traffic as posting half as often. There’s also another possibility: that this is correlation, not causation. Without knowing which sites have been polled, it’s very possible that the more popular sites post more often because they are more popular — and can hence sustain the cost of their content — rather than being more popular because they post more often.

While there’s an upward trend, the data is extraordinarily erratic; any reasonable person would see this. Yet because the takeaway states the conclusion directly — that traffic always goes up by daily post quantity — the reader would assume this to be correct.

Not Including Control Values When Drawing Conclusions

In statistics, you’re often told that causation doesn’t equal correlation. It’s a fancy way of saying that just because two things happen at the same time, it doesn’t necessarily mean that they’re related. You could win the lottery and also trip and break your leg and it could then be said that 1% of all lottery winners this year broke their leg. But it could also be that 1% of everyone that year broke their leg.

In the above graph, we are shown that 88% of consumers have abandoned their shopping carts during a checkout process and 84% of consumers who abandon their shopping carts search online for cheaper prices. The implication is that most consumers who abandon their shopping cards are searching online for cheaper prices — and that could well be true, but not necessarily based on this data.

Why? Because the data doesn’t show a control value. For all we know, 84% of consumers who don’t abandon their shopping carts also search online for cheaper prices. In other words, the statistic may not mean anything at all; it may just be that almost everyone both abandons shopping carts occasionally and searches online for cheaper prices occasionally.

But this also shows something a little more insidious and dangerous. Using common sense, we know that savvy consumers are looking up prices online and that this is probably affecting conversion rates. So we look for data that supports our beliefs because that makes sense; we don’t care whether the data actually makes sense.

Working With Extremely Small Sample Sizes and Self-Reported Data

There’s a reason why demographic research is conducted over thousands if not tens of thousands of individuals. A margin of error can significantly skew results when data is taken from small sample sizes. Even a moderate sample size can be dangerous. Take this color study, which purports to show variations of least and most favorite colors by gender.

These results are crazy! Apparently, no man has ever had the color purple as his favorite color, even though over 20% of women listed it as their favorite color. Simply looking at this infographic would lead you to believe that the color purple is universally hated among men.

But it begins to make sense when you discover the context of this study. It only polled 223 individuals. Meanwhile, a study by a sociologist at the University of Maryland, which polled 1,974 individuals rather than 223, found vastly different results. Here, we see that 12% of men — not 0% — listed purple as a favorite color. It’s still not a popular color, but it’s also not universally reviled. In fact, it’s more popular than red, yellow, orange and pink.

Many marketing metrics have astonishingly low sample sizes and most of them are not culled from the best resources. An informal Internet poll is probably the worst way to collect data because it is all self-reported data; how can we ever know whether it’s true?

At this point, one might be inclined to have a minor crisis in faith. After all, almost everything we do and assume is based on data that we researched at some point or another on the web. But all of this doesn’t mean that data isn’t valuable, it just means that we need to engage our critical thinking skills, consider the source and really dig in before we take things to heart.