The Founder’s Guide to Machine Learning: Why You Shouldn’t Build Your Own Models

Amanda Sivaraj -
May 5, 2015

Machine learning may be the hot new thing, but it’s still a mystery to most people. This isn’t surprising, since the field is complex and requires a high degree of expertise. In Part 1 of The Founder’s Intro to Machine Learning, we explained that machine learning involves the use of computer algorithms and mathematical models to solve real world problems, like figuring out why 34% of your users don’t return after three days of using your app. And, if you hired people with the right expertise and gave them enough time and resources to complete the job, they could build you a machine learning solution for your business problem.

So…why isn’t everyone doing it? Well, even for an expert in the field, it often requires months of experimentation to develop a model, and there is no guarantee that the model will provide a scalable solution to your business problem. That’s tens of thousands of dollars and a whole lot of risk for your business. On the other hand, hiring a third party service allows one of your software developers, with no specialised training in machine learning, to integrate the power of machine learning into your company in just a few lines of code.

In Part 2 of The Founder’s Guide to Machine Learning, we’ll compare the process of creating a machine learning model in-house with the simpler workflow of using a cloud-based service that provides pre-trained models.

The Process of Building a Machine Learning Model

There are a number of steps involved in using machine learning to solve a business problem, which are illustrated in the flowchart below. Depending on the complexity of the problem you’re trying to solve, this could take several months.

The Process of Building a Machine Learning Model

Note: Time estimates vary depending on the complexity of the problem you’re trying to solve.

After deciding on the problem you’d like to solve, the first steps of building a machine learning model involve data collection and labeling. As a rule of thumb, you and your team need to collect at least 100,000 data points (for example, tweets) to create robust models that produce good, useable results.

Then, you’ll need to label each of those data points with a value to show the model what you expect it to do. For example, in the case of sentiment analysis, you would label “1” for positive tweets, and “0” for negative tweets.

As a basic thought experiment, let’s say labeling each of these data points takes about 30 seconds for the average employee, which amounts to around 800 hours of labor — at $10/hour, that costs you $8000.

The next step is to split your dataset into training data and test data. This should only take a few minutes if you have the technical skill to manipulate large datasets. The harder part is coming up with the appropriate evaluation criteria, which sets the standard for how your model should behave. The amount of time this takes varies with the problem you’re trying to solve. The evaluation criteria may be readily defined for some problems – for example, an accuracy score – while others might need something more complicated, like an AUC score.

Once you’ve set your evaluation criteria, you’ll need to do feature extraction to transform your labeled data into an appropriate form for training a model. You can then move on to the actual training of your model, which is an iterative process of refining the model’s performance with your test data. Finally, you’ll spend several weeks – or even months – packaging your model in a computer program that you can then use to draw insights from your data and help solve your business problem.

As you can see, it can take a long time to build a machine learning model. This is just one of the barriers to entry that you might face if you’re trying to do it in-house. Now, let’s take a closer look at some of the other barriers.

Barriers to entry for adopting machine learning in-house

Some barriers to entry become apparent after examining the process of building a model, while others may appear later as hidden costs:

Lengthy construction time

It often takes months – even years – to build a model, depending on the complexity of the problem you’re trying to solve.

Laborious data collection and preparation

We’ve outlined this in the workflow above, but to reiterate: collecting, organizing, and labeling data for your models can take weeks, if not months of valuable man-hours.

Scarce and costly expertise

Finding and funding a team of experts capable of building an in-house machine learning model for you is difficult and expensive. Did you know that as of 2014, the average base salary of a data scientist in the U.S. is $105,000? That’s not including benefits, bonuses, or any other possible compensation, which pushes that average up to $144,000. You’ll also need data engineers and probably a project manager with a technical background. In short, a full data science team isn’t something most startups can afford.

Scalability and deployment difficulties

Models can generally process information in a fraction of a second. However, processing time might increase depending on the complexity of the model. More complex models require a sophisticated infrastructure to handle large amounts of information at a high speed. Sometimes, processing will be further complicated by usage spikes, and scaling resources up and down in response to immediate demand is difficult and costly.

Demanding upkeep

Machine learning models require regular maintenance and testing with new data to ensure the model meets expectations. Methods used to train models are continuously improving, and you’ll need apply these techniques to maintain a competitive level of performance. Without a dedicated research team, it would be impossible to keep up with the latest technology.

Take a shortcut instead

Sure, anyone can deploy a model given enough time — but as they say, time is money. You’ll want to get value from your data projects as quickly as possible, and it’s hard to keep up with the pace of innovation. Luckily, you can skip over these hurdles by using a cloud-based machine learning service. These services provide pre-trained models – models that have been built ahead of time to complete general tasks, like analyzing the tone of a sentence.

By using a service, the expertise, time, and resources required to construct, scale, and deploy a machine learning model is all neatly packaged and available for a fraction of the cost and time it would take to create one yourself. You don’t have to worry about maintenance either: at indico, our team is always training our models on better data, improving the robustness and scalability of the infrastructure to support bigger, faster models.

That’s not all. Want to hear the best part? Any developer could easily use one of these pre-trained models to move you straight from your business problem to your business solution without hidden costs and wasted resources.