What Kind of Data Science Projects Are a Fit for Crowdsourcing?

Share

Data scientists are an exclusive, sought-after subgroup of techies today. However, keeping a team of data scientists on your company’s payroll can be an expensive proposition. And even if you have a talented internal team of data scientists, data science problems and algorithms can be varied and vast in nature. Crowdsourcing is a cost-effective and efficient way to take care of your data science projects, access cutting-edge skills 24/7, or extend your existing team.

Here’s a look at the process of running data science projects with Topcoder and the kinds of projects that excel with crowdsourcing.

A competition-based model means you get the best of the best

At Topcoder, our community of designers, developers, and data scientists compete to provide the right solutions for a project. Competition automatically ensures a diversified approach to the problem at hand, which in turn leads to higher-quality results.

Imagine the Tour de France championships. It starts off with a horde of cyclists and eventually, only a handful matter. Similarly, in a data science challenge, you get a large pool of talent — members who can be broken down into Cs, Bs, As, and A+s. The pool will logically have the highest percentage of Bs (good) and a smaller percentage of A+ (top experts). The A+s are not only the most talented, but they’re also the most expensive resources. The only way to get access to these particularly high-quality domain experts again and again is by leveraging Topcoder’s crowdsourcing model.

Topcoder’s process for data science challenges

At Topcoder, a typical data science challenge is a 5-step process. The stages are:

Our average data science competition sees 500 submissions from 75 data scientists, all of whom compete to deliver the best outcome. And our clients only pay for the best solution — the result rather than individuals or hours worked.

With traditional methods of outsourcing a project — like vetting individual freelancers or taking on an agency — choices are limited, you pay for both the effort and a single result (which you may or may not be satisfied with), and the work you can get done is typically more predictable.

What kind of data science projects are a good fit for crowdsourcing?

Topcoder’s crowdsourcing model works especially well with the following four types of data science projects:

Objective optimization. Data scientists optimize a given objective in spite of its inherent constraints and uncertainties.

Algorithm optimization. These challenges involve improving already existing algorithms to often speed up and create more accurate algorithms.

Image and pattern recognition. This involves data processing — like that of image libraries or audio files — to detect patterns, correlations, anomalies, and much more.

Here are just a few examples:

Cancer recognition technology. Japanese technology company Konica Minolta is currently working with Topcoder to develop a cancer recognition technology — one that distinguishes between the cancer region and other regions positioned as the basis for all digital pathological image analysis. Within Topcoder’s community of data scientists, the ultimate goal is segmentation for pathological images to aid in the diagnosis of cancers.

GWAS analysis. In the last year, Topcoder was able to speed up GWAS analysis with the Pfizer Business Technology High Performance Compute group and collaborators from the Crowd Innovation Lab at Harvard University. Through crowdsourcing, the logistic regression in PLINK 1.07 (the open source software used for analyzing GWAS results) was accelerated by 591 fold.