Predicting sports game outcomes

Being a sports fan and someone who uses machine learning, it is just a matter of time before you will attempt to use the skills to predict the outcomes of the sports games. The temptation is too great: alure of making money on the side while bettering your skills combined with working on something that’s interesting outside of writing software is a recipe for action. The number of articles like this is huge. My goal is to share the lessons I learned and offer tips for others that might want to do a similar project in the future.

It was a fall of 2018 and I was a complete newbie when it came to sports wagering. I knew nothing about sports betting, lines, spreads, etc. It was gibberish that I heard on sports podcasts that I found amusing but never really dug in or cared too much about.

The question of “could I build a model that gave accurate predictions” kept on coming back ever since I picked up machine learning two years ago. I’ve built numerous models for work and for play since and I tossed the idea around but never executed on it. It kept on nagging at me until one weekend in the fall of 2018 I decided to jump in.

It took me about two weeks to gather the basic stats and build training database and start building models. I then spent the next two weeks simulating bets against real lines posted online looking for the winning strategies. Basically, a month (I worked on this during the weekends only for the most part) before I settled on a model I liked and evaluated against the real wagering numbers week to week.

Outcome

Let’s me summarize what was accomplished before I dive into details as to how things were done and what was learned:

I built an NFL prediction model that predicted 68% of the games correctly for the 2017 season, and 65% for the 2018 season. Not that impressive, some of the models out there that I had seen were achieving 70-73% rate, but good enough to make money if you used it to wager on a certain class of games.

I learned a ton about financial modeling and dissected two financial modeling books, treating NFL games like futures one could purchase and sell. I ended up with a system in place that allowed me to experiment with strategies, evaluate the results, graph the outcomes and see what works and does not work.

I learned a ton about gambling and how the system is basically very skewed against the gambler, and how gamblers don’t care and go for it anyway. Hey, sometimes you just want to have fun.

The final setup that was built was robust enough to be reused in predicting NBA games with similar overall accuracy but not as useful as the NFL model if one tried to use it for wagering.

Again, 65% might not sound that impressive, but the devil is in the details. I had some models that resulted in %69-70 accuracy but were not that useful from a money-making perspective. That’s the beauty of a real-world exercise vs a classroom problem. If you are working on a hypothetical model that will never be used in real world, you will do all kinds of crazy optimizations to achieve the best overall accuracy. But when you go out and try to use it against real-world data that’s available to you, the best accuracy model might not be the one that makes you the most money. More on that later.

Key Lessons learned

As is usually the case, the machine learning part was a small part of the project. The amount of time spent on model type selection, feature selection, and training is small compared to the time you have to spend on gathering the data, setting up systems to fetch the results automatically, pre-processing it to fit your model needs, and the amount of time you spend on looking for winning strategies.

You can’t predict randomness and searching for perfect accuracy is a futile attempt that probably ruins many that attempt this exercise. I could have spent time in trying to optimize the model and gather more data to perhaps bring up the model accuracy to the %70+ range but settled on something that was good enough to give me %28 profit if the money was spent on real wagering lines.

Evaluate, evaluate, and evaluate your strategies. After using the model for the 2018 season and looking back at the results, I have a feeling I got lucky and should spend more time on the evaluation of my model and more plotting of its pasts results that would give me more confidence of its performance. I went with a bit of blind faith and gut feeling that the model was good enough but retroactively was able to prove out that indeed it was good enough.

Future Posts

I want to dive in deeper into how things were done and some key ideas that were used in building a system to predict the games. In the next coming weeks, I will publish more details about the process that was used to build the model. Then will dive into the model evaluation, the most critical step, if you ask me and then share more lessons and tidbits of knowledge gathered. Depending on how much data I have gathered by then, will also share the results of the system and how it is doing for NBA game predictions.