my thoughts (in no particular order)

Main menu

Live by the Data, Die by the Data

While the Search Insiders tear up the stage in Amelia Island, Fla., I’ve been lurking virtually, watching the posts that emanate from the Everglades. Apparently, “Shitty” marketing is done for, Pinterest is the new darling of search marketers, and those same marketers are apparently fudging the ROI numbers on their search campaigns, over attributing to paid at the expense of organic, just so those paid search budgets don’t shrink. I’m assuming all this has data to back it up.

Here are other things that data supports. Facebook ads that feature cute, fuzzy critters get “liked” more often. Retired execs love the PGA, but are increasingly watching it on mobile screens. And, if you really want to beat the stock market, just invest based on who wins the Super Bowl. If an original NFC team wins, the stock market will have a bull run over the next year. But if an original AFC team wins, the bears will rule and the market will tank (which is confusing, because the Chicago Bears are a NFC team, while the Buffalo Bills, which is the closest thing to a bull in the NFL, are an AFC team – but I digress). Since 1967, the correlation between the two is close to 80%, and for a 20-year streak from 1967 to 1987, the correlation was over 90%. That odds of that correlation are 1 in 4,500,000, give or take a few thousand.

It can’t be chance, can it? If we’re paying attention to the data, we should all fire our brokers and just invest according to the current holder of the Lombardi trophy.

And that’s the problem with an overreliance on data. You start seeing signals where there is just noise. We flip our perspective to convince ourselves that there is a story where there may be none. Let’s use the Super Bowl example again. We look at the 1 in 4,500,000 odds and assume it must be more than chance. But this is faulty logic. The probability of “one” ticket winning a Powerball lottery is astronomically low, but the probability of “a” ticket winning it in any given week is pretty good. Likewise, the probability of “a” given indicator correlating with stock market performance is low, but the probability of any indicator having a positive correlation is very good. By the way, other stock market indicators with a long winning streak are lipstick sales, the length of Vanna White’s hemline and butter production in Bangladesh.

The problem is one of “overfitting.” It’s what happens when the irrationality of humans runs headlong into a mountain of data. We try to make a model work based on a subset of data, and mistake random patterns for underlying relationships. What seems to work okay in a test environment falls apart in the real world. Models get too complex, trying to accommodate too many variables, and we fall into the trap of manipulating the “what” of the data, rather than trying to understanding the “why” that’s generating the data.

I see this happen all the time in marketing. It’s particularly prevalent in the world of multivariate testing. We test variables at random, looking for the best combination. In our ongoing tweaking, we fall into a testing spiral that focuses on detail, forgetting about what the overall objective is. For example, we maximize conversion rates on one particular call to action, not looking at the impact across the entire spectrum of user experience. We become myopically optimized at the same time we draw further and further away from understanding our customers and their intent.

Marketing is still about human connection. We can analyze mountains of data, trying to understand what’s driving our customer’s behaviors, seeing patterns where they may or may not exist.