Can Kaggle make data science a spectator sport?

Updated: Don’t worry if you don’t yet have a favorite data scientist, I don’t either. But maybe that’s just because we haven’t known who to root for.

Kaggle hopes to change that with a twist on its predictive-modeling competition platform that makes public the competitors in invite-only private competitions. Think of it like watching a major tournament in golf or tennis, where you can watch the best in the world shoot it out to see whose algorithms are king. Kaggle’s tagline is “We’re making data science a sport.” Maybe now it can make data science a spectator sport.

Advertisement

Top five in the GigaOM/Splunk competition.

Actually, Kaggle has been running private competitions — in which customers’ generally remain anonymous and keep their challenge descriptions vague except for those invited to work on the data — since about the beginning of 2012. In the past, though, even the competitors remained a mystery. It also posts leaderboards for all public competitions and a cumulative leaderboard. But as any sports fan knows, there’s nothing quite like watching a tournament where only only the best of the best can play, and where the pressure is on.

Now, says Kaggle Founder and CEO Anthony Goldbloom, private competitions are more like running the U.S. Open in that others can watch the leaderboard and see how the invited data scientists are faring. It’s primarily a feature so other data scientists on the Kaggle platform can gauge their relative performance and get a little more motivation to step up their game and make it to the invitation-only competitions, but I think it could become geek spectator sport under the right conditions.

If you’re wondering which U.S. Open he’s talking about (golf or tennis), don’t fret — had Goldbloom been asked whether Kaggle is more like golf or tennis before it launched, even he might have guessed wrong. He’d probably have guessed tennis, in which certain players excel on certain types of courts, like Roger Federer on the grass court at Wimbledon, or Rafael Nadal on the clay court at the French Open. So, someone who works in biotech might naturally prevail in those competitions, while a natural-language processing specialist might do best in competitions with lots of text to mine.

It turns out Kaggle is more like golf, in which a dominant player like Tiger Woods can win on pretty much any course he plays. Newcomers can still win, especially because there are plenty of good data scientists still making their way to the nascent Kaggle platform, but, Goldbloom says, the really good ones will adapt their skill sets to whatever is necessary for any given competition.

The first private competition open to public viewership began on Wednesday, and is somewhat unique in that the sponsor is willing to share its name and its challenge. It’s insurance provider Allstate, and it’s trying to predict customer churn. According to Goldbloom, the prohibitive favorite is Jason Tigg, an Oxford physicist turned hedge fund manager, but Indianapolis actuary Shea Parkes and apparent mystery man Jonathan Peters are names to watch.

There you have it, sports fans. Place your bets accordingly.

Note: This story was updated to reflect that Oxford physicist Jason Tigg decided not to take part in the first private competition open to public viewership.

An interesting conversation indeed! No one wants Kaggle to become Elance of data science or do we? However, cloud computing is likely to flatten the world of data science further. It is also not certain what economic models will eventually emerge out of the intersection of cloud and data science. Data science platforms on cloud like GingerBrain sound promising but can they pave the way for consumerization of data science? Who knows.

Kaggle pisses me off!
A skill that has taken me 10 years and a PhD to acquire has been rendered almost worthless by Kaggle. I value my own skills very highly but am not about to go prostituting myself for peanuts in the Kaggle model. The only ones getting rich from this are the sponsors and Kaggle founders.
No way will I ever stoop so low – I can find my own data thanks very much …

weakness? an insane race to the bottom, pay-wise, for data scientists. if you add up the amount of labor put into some of these “contests” and divide by the pay, some of the companies hosting these contests are getting labor for cents-per-hour. can you say race to the bottom??

I think itâ€™s great that Kaggle is bringing attention to the role of data scientists. Iâ€™ve read multiple posts calling to question the necessity of this role because of the ease of access to programs and services that organize, and analyze data. The platform may not support a far-reaching monetary value to entrants, but just as in pro-sports arena, the competitions will hopefully open doors for other opportunities. Gary Z, Neustar | http://bit.ly/Qy62pL

Kaggle is all exciting right now because it is creating a platform to learn , experiment and socialize around a new high-demand skill area. Also the obscure academics have found a way of getting some limelight.

When the newness of “Data science” fades (maybe in 2yrs) Kaggle will have to look at other business models. Kaggle cannot be a consistent source of income for Data scientists because of the sport nature of the platform. Kaggle needs to find a way to compensate the 90% of people working on the problem if if they were not the first three winners.

As far as I know – most of Kaggle userid’s are people who are just curious and want access to the data on the competitions.

Think Java in 1998-2000 getting paid $150 per hr. And think about it now on Elance wit projects getting executed at $15 per hr.

Those are some fair points, although I think the company has a longer-term vision (http://gigaom.com/cloud/kaggle-is-now-crowdsourcing-data-science-creativity/) that involves different types of competitions and even a services angle. The competitions might always be for fun for the entrants — even if prizes decrease in amount — but Kaggle’s main emphasis will be sustaining revenues for its business.