Data science becomes a sport

A list of the current competitions on Kaggle.com, showing cash prizes ranging from $5,000 to $3 million. Do you think you can make the best predictive model? Image courtesy Kaggle.com.

Has crowd sourcing finally entered the big league? A new ‘predictive modeling’ website that links large corporations and industry with independent statisticians (crowds) from around the world has just been awarded $11 million of investment from some of the biggest names in Silicon Valley.

The website is called Kaggle.com and was started up by Australian statistician, Anthony Goldbloom who coded the website in his small apartment in Sydney. “Its crowd-sourcing for geniuses," he told Australia’s Sydney Morning Herald.

Kaggle.com has already proved its worth: NASA ran a dark matter competition to solve a problem which had been worked on for 10 years, but British glaciologist Martin O’Leary, from the University of Cambridge in the UK, solved the problem in a week and a half.

The web service works as an intermediary platform that connects people with a problem and the data associated with that problem, with the potential problem solvers. A company will create a competition about a problem they want to solve and upload the volume of data.

For example, a supermarket wants to know when their customers are more likely to return. Using a history of previous visits, predictive modeling is a way to find patterns and relations in existing data and then use that information to predict what will happen where there are gaps of data.

Using this method a supermarket can statistically understand if there might be a drop off in visits. If that happens they can adapt their commercial strategy by sending discount vouchers to their customers. These predictive algorithms can be compared with existing customer data behavior. The more they match, the more confident a company can be about future predictions.

Once a competition is open, anyone from around the world is free to create a predictive model to find the best solution to the problem. ‘Players’ in these competitions can even form teams with other members. Individual players or teams are instantly ranked on an online scoreboard depending on how their models perform. The winning model is awarded a prize, which is typically in the form of money.

Goldbloom thinks his service is turning data science into a sport: “The very best data miners or statisticians can earn as much as the very best golfers or tennis players,” he said.

Kaggle.com claims that every competition they’ve run has outperformed existing predictive models. One example was the competition offered by NASA to enable them to have a more accurate algorithm to map out dark matter in the Universe. In all, 73 teams competed in the Dark Matter competition. This first breakthrough came within 10 days by the aforementioned Martin O’Leary. His solution was a mathematical model for the tiny distortions in images of the galaxy that are thought to be dark matter. This leapfrogging continued for a number of days with five researchers, until eventually David Kirkby and Daniel Margala from the University of California claimed the prize on August 18th.

"Glaciologists use different techniques to astronomers. It turns out that the techniques that glaciologists use are really powerful on this NASA problem," said Goldbloom.

The largest competition on Kaggle.com is the Heritage Healthcare Prize by the Heritage Provider Network, with a cash prize of $3 million. The competition asks players to predict the likelihood of patients going to hospital in the next year. With an accurate predictive model, the company can take preventative steps to help the patient stay healthy and avoid a hospital stay, while increasing the company’s own margins. The problem still remains unsolved.

Goldbloom’s service is most definitely in the spotlight as Kaggle.com’s investors are impressive. They include founder of PayPal, Max Levchin, Google chief economist Hal Varian, and a number of other high-profile investors.

Goldbloom said, “the big difference with Kaggle is that the problems it is solving are fundamental to the way big companies operate, and are therefore worth a lot of money."