It's football season and we kick off with WSO2 BigDataGame

Few other professional sports generate more data than American Football - especially when the Big Game beckons. Things heat up. Tables are drawn, graphs are computed and analysts take to the predictions game like moths to a flame.

WSO2 Machine Learner is the latest addition to our products portfolio, and while it was build as a high performance, open source predictive analytics platform that takes enterprise data, uses machine learning to analyze patterns and generates models that can be accurately used to make business predictions, there’s no reason why you can’t use it for sports analysis too.

This is exactly what our team set out to do a few weeks ago. In a fit of experimentation, we connected WSO2 Machine Learner with the data it needs to try and predict the Big Game.

Setting up for the Big Game

American football basically has three seasons. Preseason, regular season and playoffs. After a bit of searching, we came across pro-football-reference.com, which had the data on all the teams for many years, and collected the historical data for 2012, 2013 and 2014.

A few rules were established:

Pre-season data should not be considered because some of the best players don’t play in them.

Injuries, are very common and really skew the data, especially if it’s a quarterback who gets hurt.

Teams that have won the Big Game have usually had a great defense.

Some teams start off the season slow and then begin playing better to make the playoffs.

Taking all this into consideration, we paired Random Forest regression with stacked autoencoders.

And it works! We did a little bit more calculation and arrived at a mathematical 76.5% accuracy rate, which was confirmed by our first set of predictions for the four games held the weekend that the Bengals played the Steelers.

We quickly built out a site so that anyone can test it out for themselves.

You’re free to run any two teams you want against each other and see which one stands the best chance of winning.

Do note that we’re still in the process of tweaking it. Right now, we’re basically predicting probabilities of success - and while we have faith in our product, there’s a whole lot of things that are impossible to account for, injuries in particular. There’s also no predicting the effects of morale on a team; that stuff is sorcery.

However (while it will take more data to confirm this) we’re confident that, as of the time of writing, BigDataGame is one of the most accurate solutions on the web.