Wasting money with data science

Investing money with the latest hype is normal. Wasting it once the hype has passed is dangerous.

The majority of the companies around me are wasting money with data science.

They see overwhelming evidence that data science1 is changing sectors and creating new business
opportunities. Just by looking at the Dutch landscape, there is no doubt that teams around us are
using data science to create value. Off the top of my head Bol.com, Uber (Eats), Booking.com, ING,
NPO, Marktplaats, Quby, etc.

But for each of them, there's a handful of companies that are not successful and, in fact, wasting
their resources with data science.

It all starts with not understanding what data science can do to add value to the bottom line
and, most importantly, the enablers to make it possible.

Let's look at an example. I am sure you can adapt it to your company.

A hospital wants to predict if the patients entering the emergency room, based on
information they have during the intake, will be hospitalized or not.

The prediction allows the hospital to better plan the resources of the various departments.
This will, in turn, lead to cost savings.

Some data is gathered, given to data scientists, and — after two weeks — the first demo
takes place. The results are promising, but they need a bit more time.

Fine. After all, the data was messy: they had to clean it up and go back to the source a couple of
times.

Two weeks pass and the new results are even nicer. With 70% accuracy, they can predict if a patient
will go home after their visit to the emergency room.

This is much better than random (50%)! A full-fledged pilot starts.

They are faced with a couple of challenges to go from model to data product:

How to send the source data to the model is unclear;

Where the model should run;

The hospital operations need to change, as the intake happens with pen and paper;

They realize that without knowing to which department the patient will go, they won't add any
value;

To predict the department, the model need the diagnosis. But once the diagnosis gets typed in the
computer, the patient has reached their destination: the model is useless!

If you think this is unusual, I cannot tell you how many proofs of concept (PoCs) I have witnessed
that suffers from (some of) the same weaknesses:

No clear business case;

No data platform where data pipelines can be created;

No awareness of the impact on operations (the pen and paper in the hospital example);

No realization that a model is useful only if the predictions are timely;

No clear hand-over mechanism once the first iteration of a model is finished (i.e. where will it
run and whose responsibilities will it be).

The list goes on, but you get the gist.

What do you need to make it all happen? I can think of at least these roles:

Software developers to embed or integrate the data product with other business applications,
websites, apps.

Database administrators from the other departments to open up the databases, etc.

On the "operational" side, you need

A data platform, where pipelines run and where data land;

A data driven mentality where data and knowledge can flow freely between organizational silos;

A data science workflow: how to improve the model once the first iteration is running, how to
hand the model over, how should the business give input, how to close the feedback cycle etc.

If you just count the roles needed to have a team in place that can deliver data driven
models/products, I come up with the following:

2 Data Engineers, 1 Lead

1 Data Scientist, 1 Lead

1 Product Owner

2 System Administrators (that's low for redundancy, but still).

In The Netherlands we can assume the competitive landscape pushes the cost for each role at around
100.000 EUR/year (including social costs). I'm probably being a bit conservative here.

Adding the platform cost (let's round it up to 100.000 EUR/year), it comes down to 1M EUR/year.

What does this money buy you? Let's say the team has enough throughput to deliver 5
models/year (1 model each 2 months, including some holidays here and there, training, conferences).

These 5 models are unmaintained: you don't add new features, you don't maintain data pipelines,
etc. Doing all the above, in a robust fashion, probably reduce the throughput to maybe 3
models/year — at some point, though, you will need more people to maintain older models, or they
will stop working.

Let's not forget the software engineers and DBAs and subject matter experts from the other
departments that need to be involved for all this data and knowledge flows. Approximately 500.000
EUR/year?

Let's look at the math again: 3M EUR for 6 data science cases in production (3/year for 2 years).
The first cases will probably hit production after ~6/12 months, as the platform,
data pipelines, dev and production environments, and so on need to be there.

Most Dutch companies do not just invest 3M EUR for 6 data science cases. Why?

A serious executive buy-in is missing: a buy-in that involves at least 3M EUR;

There is no data strategy: people start doing things, without a clear picture of what all the
above entails. Without strategy, it is hard to invest the right amount of money on something
that, on paper, has the potential to disrupt the industry and give a clear competitive advantage
to the company. In practice, however, all this is potential and not a sure investment.

So what happens? People start projects that are dead on arrival. They still waste huge amount of
money, but no value is added to the bottom line.

This is the whole reasons large companies buy start-ups. By having just 1-2 roles dedicated to a
core data product, by taking in large amounts of technical debts and more, start-ups are able to
prove the business case quickly. Once that is done, there are two options:

The start-up needs an even larger amount of money to pay the technical debt off and scale;

They get bought and the buyer will spend that money.

How do you fix it all?

First: make a data strategy. Which areas would benefit more from data science, how to do hiring,
how to become data-driven, etc. Especially: how much money is needed to get the flywheel
spinning?2

Second: collect the business cases and prioritize them by value. Are they feasible up until the
moment where they generate money?

Third: Get external help to validate the 1-2 best use cases. Good consultants can quickly show if
something is feasible, what's the — give or take — expected accuracy, run time, and so on.

Fourth: Build the platform, formalize the hiring, etc. This will maybe take 6-8
months. Do not skip the Lead DE and DS — you should start with them!3

Fifth: Get the cases into production. This will take 3-4 months the first time, as lots need to be
planned, adapted, etc.

Sixth: Evaluate the business cases, refine them, and be sure to have a "lessons learned" moment.

Seven: On-board new business cases, rinse and repeat.

Eight: Profit.

Is it really a fix?

It is easy to write a blog post with eight simple steps that take more than one year and a couple
million euros.

It is much harder to implement all the 8 steps.

This is why executive buy-in is as important as the data strategy. Without it, it's impossible to
endure all the small and large fires along the way — there will be many especially when starting
out.

Are all these roles needed?

Some data science projects are mislabelled. They are glorified data analysis projects, where a bit
of SQL, a bit of visualization, and — maybe — a bit of Python will be enough for a dashboard, a
report, an Excel file.

If the project is more data analysis than data science then, don't worry: most of what I wrote
above does not apply.

Is GoDataDriven the right partner to start with the 8 steps?

If you want more ramblings, follow me on Twitter: I'm gglanzani there!

I will use Data Science as synonym of Machine Learning for the purpose of this post,
although this is not factually accurate. ↩

Start immediately with hiring. In the Netherlands this takes a lot of time. Nobody
new to the field is aware of this, and the HR departments of large corporations are often not up
to the task. ↩

Are Leads too expensive? Yes, they are. Less than external consultants though, and hopefully
as capable as them. Be also sure to hire engineers first. You have a model written by the
external consultants that can be put into production; engineers alone can do that assuming the
consultants did a good job. ↩