Main navigation

Basketball Bracket vs. Marketing Data Models

I’m willing to bet that when people think about March Madness, the first thing that comes to mind is the barrage of brackets filled out by 60 million people each year – including President Barack Obama. A quick scrape of social media words associated with the phrase March Madness confirms that in addition to love and best, brackets are, in fact, top-of-mind.

So what is it about filling out a bracket that attracts millions of people each year? There is probably an inherent love of NCAA basketball and plenty of college loyalty to go around, but implicit in the idea behind picking brackets is the challenge of being right. The problem is that being right, at least being right about all 63 outcomes, is nearly impossible.

There are ways to make the odds of picking winners more likely though. With a little bit of NCAA basketball knowledge, experts have estimated that the chances of picking a perfect bracket go up to around 1 in 128 billion. Still not great, considering that the odds of winning the Mega Millions Jackpot are 1 in 258 million.

While picking a perfect bracket is a long-shot, that shouldn’t stop anyone from trying to get as close as possible. Currently, the accuracy of most expert brackets is under 70 percent. The problem with expert opinion is that it makes heavy use of qualitative analysis, which suffers from being time -consuming and difficult to reuse. So how do we set a benchmark of 70 percent accuracy and deemphasize the need for an expert and make our prediction process reusable? Enter the concept of statistical modeling.

In recent years, the idea of modeling a bracket using statistical or machine-learning models has caught on with data bloggers and data competitions hosted on sites like Kaggle.

There is a lot of overlap between statistics and machine learning. Machine learning is prone to providing better predictive power, often at the cost of less understanding of the general relationships between the outcome you are trying to predict, and the data used to predict it. So which should you use for the best chance of bracket success? Both.

FCB Garfinkel’s data analytics team found that by using both models, we were able to predict the outcome of independent games with an 80 percent accuracy rate. We used a limited set of predictors including points scored per 100 possessions, points given per 100 possessions and the difference in seed ranks between two competing teams. However, our predictive power quickly went down when we predicted brackets, revealing that the high cost of making a mistake early in a bracket prediction quickly degrades our 80% accuracy to around 40 – 50%. That’s way below our goal of 70% accuracy in the brackets. So the strategy becomes: predict winners when it seems obvious based on model results, and predict upsets when winners are not obvious based on model results.

To do this we needed to build two sets of models. The first set to predict winners and the second to indicate when an upset is likely. When we saw that our first model predicted a clear winner, we went with that result. But, when the first model predicted that a team would win by a small margin, we employed the second model to see if an upset was likely.

Essentially, we found that there was always at least one case where the outcome was not clearly predicted by our models but needed human intervention to get better results. So while we could automate most of the “easy” predictions, the tough ones still needed a person’s attention.

The major lesson in this is namely this: if you can use a model to offload the portion of the decision-making process that is low-risk, you can essentially spend the remaining time analyzing the more difficult parts of the decision-making process.

It’s true in our business. Our predictions at the FCB agency can take various forms: we may want to predict which consumers are most likely to purchase a client’s products. We may need to predict the geographies, e.g., ZIP Codes that are likely to purchase the products. We may want to rank order physicians for visits by pharmaceutical sales reps, based on their expected incremental prescriptions. Or we may need to identify consumers at risk of defecting from loyalty programs. Keep in mind that, as demonstrated in the bracket models, models are best used for decision support, to improve our odds of “winning.” In marketing, our odds can usually be made better than the one in 9 quintillion for the brackets!

Bottom line, when you’re making decisions amid uncertainty and with competition, the process is difficult and making the perfect choice is near impossible. Whether picking brackets in March or trying to make marketing decisions to capture additional market share, it’s hard. If models exist that give you an answer with 100% accuracy, instead of supporting the decision-making process, we’d all be rich. But models are best used when trying to make decisions and should not be relied upon with anything but scrutiny.

So for 2015, we’ll post our bracket predictions to see how our model does. But we’ll also post a tool that shows the likelihood of different teams winning at each stage in the journey. Because models work well at getting us to a better answer, but they work best when they augment what we already know.