Kaggle Challenges and the Value of Data Science

The impact of data on business outcomes is covered with buzzwords. The people in the loop say real things sometimes (examples here), but there’s a twist. Vendors picks only the best cases that sell their stuff, and their clients conceal successes to leave competitors guessing.

Let’s turn to Kaggle for balanced statistics. The Kaggle competitions put participants in the same conditions, which allow for easy comparison. The website maintains public and private leaderboards for each competition, based on test data. I use the set of public leaderboards available here.

Businesses hire many data scientists now. And the first interesting question to the data is: should I select talents carefully or hire people fast? Here’s a test: let’s look at the winning margins on the top of leaderboards. If they’re large, then the skill premium may be large as well, so it’s worth looking for better candidates and pay them more. This is the answer in one chart:

Each line represents a competition. The y-scale shows the final score of a participant as a fraction of the winner’s score. The score is a statistical metrics reflecting the quality of a (typically) prediction of interest, such as revenues, votes, or purchases. In some cases, the higher score is better, in the others, it’s the opposite. Lines are moving in the respective directions.

This case is slightly unusual because it has distinctive leaders with large handicaps. Still, those who try—the red dots—eventually succeed. The problem is, very few do try:

In 4,000 cases, a team submitted only a single solution in a competition. Really serious attacks on the problem start with 10+ submissions, which few teams make.

Despite this, many participants end close to the winner:

Looking from a different perspective on individual performance, I compare how the same users completed different competitions:

These five races involved 500+ users each, and some users overlap. The overlapping shows the Kaggle core: the people who compete regularly and finish high (left-bottom corners of each subplot). Elsewhere, the relationships are weak.

These modest evidences suggest that people matter less and commitment more.

Does time matter? I take the means by the days remaining until the last submission:

This data belongs to the attempts to predict lemons at car auctions. The higher score is better here, and you see that additional submissions don’t improve the quality of an average submission. The leaders do improve slowly, however. Data scientists find low-hanging fruits in available data quickly and then fight for small improvements with much time investments. For one example, read this detailed journal by Kevin Markham.

A typical disclaimer would mention various limitations of these plots for decision making or of Kaggle competitions for real cases. Yes, while hiring, you need to know more than this. I would emphasize a different thing. Managers like intuitive decisions and confirm them with favorable evidences, including statistical insights. But having numbers this way isn’t the same as thinking that starts from numbers. Most businesses can get almost nothing from data scientists before their managers start thinking from numbers, not to numbers. And this transition from intuition to balanced evidences yields more than improving a single prediction by a few percentage points mentioned here.