A Marketplace for Data Scientists

Kaggle has evolved from crowdsourcing data analytics to a marketplace that bridges the gap between data problems and solutions.

Big companies such as GE, Microsoft and Merck are turning to data scientists to help them find answers, patterns and intelligence that will help them innovate and make critical business decisions. Even for large enterprises, the resources required to marshal a team of these data analytics wizards can be a limiting factor, but a startup company is using crowdsourcing model to give companies access to an army of data scientists.

Kaggle founder and CEO Anthony Goldbloom has built an army of 85,000 data scientists that competes to find big data solutions. The startup is located in San Francisco's SoMa District across the street from AT&T Park (seen in the background).

Leading those legions of approximately 85,000 data scientists is Anthony Goldbloom, founder and CEO of Kaggle, a competition-based platform for predictive modeling and analytics. Drawing upon his admitted obsession with data analytics and experience with macroeconomic modeling for the Reserve Bank of Australia and Australian Treasury, Goldbloom founded the company in Melbourne in 2010. In 2011, he moved operations to San Francisco and in March launched the company’s signature product, Kaggle Connect, a platform that links companies to Kaggle’s growing community of data analytics experts.

Goldbloom’s accomplishments are impressive, especially for someone in his 20s — until June, that is. He’s twice been named to Forbes’ annual “30 Under 30” list of young technology leaders, has been featured in Fast Company’s “Who’s Next” series and is a featured speaker at the upcoming Data 2.0 Summit. Recently, Goldbloom sat down to discuss how Kaggle fits in the world of big data, where his company is going and that “terrible” name.

You had an internship at the Economist in 2008. Did that experience spark your interest in big data analytics and plant the inspiration for Kaggle?

That’s definitely the case. To give you some background, the Economist has an essay competition every year. I entered with an essay about sub-prime mortgages, which was a hot topic at the time. It was actually an unfortunate essay; I wrote why the sub-prime mortgage defaults weren’t a problem and I don’t know what all the fuss is about. Of course, that’s what caused the global financial crisis [GFC]. There couldn’t have been a worse conclusion to draw. Nonetheless, it was a strong enough essay that it won the competition and the prize was a three-month internship. I pitched a piece on big data and data science and my editor said I could write it. Being able to call people up and say, “Hi, I’m Anthony Goldbloom of the Economist, I’d like to speak to XYZ” and everybody answers your calls turned out to be a fabulous way to do market research. I noticed that a lot of the people I was speaking to within companies doing predictive modeling weren’t that strong. It was frustrating to me and it got me thinking of a business model in which meritocracy is the basis of what you can charge companies objectively for that type of work.

You’ve gone from covering big data to being enveloped by it. How do you define “big data”?

Big data is data that doesn’t fit into Microsoft Excel. It’s actually not my line. I got that from [senior data scientist] Monica Rogati of LinkedIn, but I like it a lot. Big data is a buzzword. I’m glad it exists because it makes people more interested in what we do. There’s an enormous amount of value to be had out of data. Ten years ago those decisions were made on gut feel and intuition and now we’ve had fabulous case studies both in business — I think of Tesco and the advent of the loyalty card as probably the most prominent business case study. There’s also the case study of the Oakland A’s and “Moneyball,” and another on predicting election outcomes. I think all of this comes to demonstrate why basing decisions on data leads to much better decisions than just relying on gut instinct.

So how does Kaggle work?

We manage a community of 85,000 that I would argue to be the world’s most elite statisticians and data scientists. We rank them objectively. We’re a marketplace that matches the best of them up with companies who are trying to get statistics or machine-running problems solved. We rank them objectively [through competitions] before they get selected or invited to Kaggle Connect. Those who perform best in the competitions get invited to Kaggle Connect. The competitions are a way to qualify talent and see who are the world’s best data scientists and what they’re good at and the Kaggle Connect piece is how they can monetize their competition performance.

You can objectively judge how good somebody is at doing data analytics and statistics by measuring how accurate their solutions are. It felt to me that rather than hiring somebody because they have a really nice CV, I really like the idea of meritocracy. What really appeals to me about competition, particularly those that are objectively judged, is that they’re really meritocratic. The best person wins because they have the best model.

A data visualization from a "starter" Kaggle competition designed to introduce people to data science and machine learning that analyzed which passengers were likely to survive the 1912 sinking of the RMS Titanic. Image courtesy of Kaggle

Is this crowdsourcing or more of a marketplace for data analytics?

Crowdsourcing is often associated with very low-value, high-volume work. I think it was reasonable for people to refer to Kaggle as crowdsourcing when all we did was competitions. That’s where the business started. Now that we do Kaggle Connect I’d say that defining us as a marketplace better reflects what we do.

If you turn a critical eye on Kaggle, what holes would you poke at?

I can certainly tell you the arguments people make against using us and I can also tell you how I refute them. People often wonder how a problem can be solved by someone who is not fully educated on or immersed in their industry. For example, Intel might question using someone who doesn’t necessarily know about chip design. The way I respond to that is working with Kaggle is the ultimate collaboration between a domain expert and a data scientist. A domain expert knows the business content — they know how the data was collected and how the output will fit in their operations. The data scientist dots with the domain expert, doing all the complex mathematics and extracts the value of the domain expert.

What’s an example of how Kaggle’s data scientists have helped a company?

We recently helped Ford with a really cool problem. It was a research project around a sensor that determined the alertness of drivers. They wanted to determine through sensor readings whether the driver was alert or not. They took three classes of variables: They took environmental variables like how sunny it was outside and what the temperature in the car was. They took sensor readings like body temperature, eye movements and heart rate. The third type was psychological — what type of mood is the driver in, that type of stuff. They wanted to see which of these characteristics contributed to a driver being more alert or less alert. It was a research project that was aimed at how to equip cars that would keep a driver alert.

We built them an algorithm that gave them feedback from the sensors. I know they were very happy, but I don’t know what the commercial implications were. We didn’t get a lot of feedback from them because they wanted to keep it pretty buttoned up, even with us.

How would you characterize the data scientists within the Kaggle community?

There are three classes of data scientists. There are those who compete in the competitions mostly for fun; they’re not really interested in income. You’ve got academics who want access to real-world problems, and the third group is very interesting and one that’s increasing, and they are the people who are starting to rely on Kaggle and also their Kaggle reputation to get full-time income. I found it interesting that the New York Times, for instance, put up a job advertisement; they’re looking for a data scientist. One of the first job requirements they put on it was “Has a Kaggle ranking.” So we’re a well-wearing credential which is kinda cool.

About that company name, where did “Kaggle” come from?

It’s a terrible name because most American’s pronounce it “‘kā-gəl,” [rhymes with “bagel”] which sounds like the pelvic floor exercises. Australians pronounce it “‘ka-gəl” [rhymes with “haggle”]. I didn’t have any money when I started the company to purchase a domain name so I built an algorithm that iterated phonetic domain names and printed out a list of what was available. My wife and I went through the list and “Kaggle” was the one we picked. It’s algorithmically generated. Apparently, “Sex and the City” did an episode on Kegel exercises. If not for that episode I wonder if anyone would have heard of Kegel exercises. When we moved the company away from Australia and to the U.S. that’s when we started being ridiculed.

Kaggle has evolved from crowdsourcing data analytics to a marketplace that bridges the gap between data problems and solutions.

Big companies such as GE, Microsoft and Merck are turning to data scientists to help them find answers, patterns and intelligence that will help them innovate and make critical business decisions. Even for large enterprises, the resources required to marshal a team of these data analytics wizards can be a limiting factor, but a startup company is using crowdsourcing model to give companies access to an army of data scientists.

Kaggle founder and CEO Anthony Goldbloom has built an army of 85,000 data scientists that competes to find big data solutions. The startup is located in San Francisco's SoMa District across the street from AT&T Park (seen in the background).

Leading those legions of approximately 85,000 data scientists is Anthony Goldbloom, founder and CEO of Kaggle, a competition-based platform for predictive modeling and analytics. Drawing upon his admitted obsession with data analytics and experience with macroeconomic modeling for the Reserve Bank of Australia and Australian Treasury, Goldbloom founded the company in Melbourne in 2010. In 2011, he moved operations to San Francisco and in March launched the company’s signature product, Kaggle Connect, a platform that links companies to Kaggle’s growing community of data analytics experts.

Goldbloom’s accomplishments are impressive, especially for someone in his 20s — until June, that is. He’s twice been named to Forbes’ annual “30 Under 30” list of young technology leaders, has been featured in Fast Company’s “Who’s Next” series and is a featured speaker at the upcoming Data 2.0 Summit. Recently, Goldbloom sat down to discuss how Kaggle fits in the world of big data, where his company is going and that “terrible” name.

You had an internship at the Economist in 2008. Did that experience spark your interest in big data analytics and plant the inspiration for Kaggle?

That’s definitely the case. To give you some background, the Economist has an essay competition every year. I entered with an essay about sub-prime mortgages, which was a hot topic at the time. It was actually an unfortunate essay; I wrote why the sub-prime mortgage defaults weren’t a problem and I don’t know what all the fuss is about. Of course, that’s what caused the global financial crisis [GFC]. There couldn’t have been a worse conclusion to draw. Nonetheless, it was a strong enough essay that it won the competition and the prize was a three-month internship. I pitched a piece on big data and data science and my editor said I could write it. Being able to call people up and say, “Hi, I’m Anthony Goldbloom of the Economist, I’d like to speak to XYZ” and everybody answers your calls turned out to be a fabulous way to do market research. I noticed that a lot of the people I was speaking to within companies doing predictive modeling weren’t that strong. It was frustrating to me and it got me thinking of a business model in which meritocracy is the basis of what you can charge companies objectively for that type of work.

You’ve gone from covering big data to being enveloped by it. How do you define “big data”?

Big data is data that doesn’t fit into Microsoft Excel. It’s actually not my line. I got that from [senior data scientist] Monica Rogati of LinkedIn, but I like it a lot. Big data is a buzzword. I’m glad it exists because it makes people more interested in what we do. There’s an enormous amount of value to be had out of data. Ten years ago those decisions were made on gut feel and intuition and now we’ve had fabulous case studies both in business — I think of Tesco and the advent of the loyalty card as probably the most prominent business case study. There’s also the case study of the Oakland A’s and “Moneyball,” and another on predicting election outcomes. I think all of this comes to demonstrate why basing decisions on data leads to much better decisions than just relying on gut instinct.

So how does Kaggle work?

We manage a community of 85,000 that I would argue to be the world’s most elite statisticians and data scientists. We rank them objectively. We’re a marketplace that matches the best of them up with companies who are trying to get statistics or machine-running problems solved. We rank them objectively [through competitions] before they get selected or invited to Kaggle Connect. Those who perform best in the competitions get invited to Kaggle Connect. The competitions are a way to qualify talent and see who are the world’s best data scientists and what they’re good at and the Kaggle Connect piece is how they can monetize their competition performance.

You can objectively judge how good somebody is at doing data analytics and statistics by measuring how accurate their solutions are. It felt to me that rather than hiring somebody because they have a really nice CV, I really like the idea of meritocracy. What really appeals to me about competition, particularly those that are objectively judged, is that they’re really meritocratic. The best person wins because they have the best model.

A data visualization from a "starter" Kaggle competition designed to introduce people to data science and machine learning that analyzed which passengers were likely to survive the 1912 sinking of the RMS Titanic. Image courtesy of Kaggle

Is this crowdsourcing or more of a marketplace for data analytics?

Crowdsourcing is often associated with very low-value, high-volume work. I think it was reasonable for people to refer to Kaggle as crowdsourcing when all we did was competitions. That’s where the business started. Now that we do Kaggle Connect I’d say that defining us as a marketplace better reflects what we do.

If you turn a critical eye on Kaggle, what holes would you poke at?

I can certainly tell you the arguments people make against using us and I can also tell you how I refute them. People often wonder how a problem can be solved by someone who is not fully educated on or immersed in their industry. For example, Intel might question using someone who doesn’t necessarily know about chip design. The way I respond to that is working with Kaggle is the ultimate collaboration between a domain expert and a data scientist. A domain expert knows the business content — they know how the data was collected and how the output will fit in their operations. The data scientist dots with the domain expert, doing all the complex mathematics and extracts the value of the domain expert.

What’s an example of how Kaggle’s data scientists have helped a company?

We recently helped Ford with a really cool problem. It was a research project around a sensor that determined the alertness of drivers. They wanted to determine through sensor readings whether the driver was alert or not. They took three classes of variables: They took environmental variables like how sunny it was outside and what the temperature in the car was. They took sensor readings like body temperature, eye movements and heart rate. The third type was psychological — what type of mood is the driver in, that type of stuff. They wanted to see which of these characteristics contributed to a driver being more alert or less alert. It was a research project that was aimed at how to equip cars that would keep a driver alert.

We built them an algorithm that gave them feedback from the sensors. I know they were very happy, but I don’t know what the commercial implications were. We didn’t get a lot of feedback from them because they wanted to keep it pretty buttoned up, even with us.

How would you characterize the data scientists within the Kaggle community?

There are three classes of data scientists. There are those who compete in the competitions mostly for fun; they’re not really interested in income. You’ve got academics who want access to real-world problems, and the third group is very interesting and one that’s increasing, and they are the people who are starting to rely on Kaggle and also their Kaggle reputation to get full-time income. I found it interesting that the New York Times, for instance, put up a job advertisement; they’re looking for a data scientist. One of the first job requirements they put on it was “Has a Kaggle ranking.” So we’re a well-wearing credential which is kinda cool.

About that company name, where did “Kaggle” come from?

It’s a terrible name because most American’s pronounce it “‘kā-gəl,” [rhymes with “bagel”] which sounds like the pelvic floor exercises. Australians pronounce it “‘ka-gəl” [rhymes with “haggle”]. I didn’t have any money when I started the company to purchase a domain name so I built an algorithm that iterated phonetic domain names and printed out a list of what was available. My wife and I went through the list and “Kaggle” was the one we picked. It’s algorithmically generated. Apparently, “Sex and the City” did an episode on Kegel exercises. If not for that episode I wonder if anyone would have heard of Kegel exercises. When we moved the company away from Australia and to the U.S. that’s when we started being ridiculed.

Take This Content

Follow Intel Free Press

Search for:

Take Our Stuff!

You Are Free to Use This Content

Copyright to original Intel Free Press content is owned by Intel, but words, photos and videos we share on this site may be republished, edited, and re-used free of charge unless otherwise noted. See more about us.