The Black Sheep of Data Science

My data scientist colleagues all get excited about the science of the analytical process. The goal is competing – sometimes with themselves! – to achieve the best model they can get.

The process looks like this: Gather the best data set one can get that best fit the business problem. Often, this involves merging dozens of pre-existing data sets, making the process take weeks … if not months and, in some cases, years. Use this data to create the first model. But my data scientist friends are never satisfied. Instead, they are constantly trying to improve the model using machine learning algorithms that no business person understands. (They use such complicated language, it seems they are also competing about who has the newest buzzword in the industry).

Ronen Meiri: “I used to not like it when my data scientist colleagues called me the black sheep of the industry. Now I celebrate it. After all, it’s led not only me, but my business, DMWay, to stand out in the crowd.”

When businesses hire these data scientists, they are taking on a very high-paid employee that never asks a simple question: How does my work give value to the organization?

Adding the “value” measurement to the data scientist’s work, of course changes the entire process. Instead of working toward having the best dataset for the most accurate model, the data scientists needs to think about how much time and how many resources he/she is spending to create the first version of the analytical dataset. It may not be perfect, it may ideally need to be integrated with more and other data sources, but it already creates value with limited resources spent on the process.

Using an automated tool, such as DMWay Analytics, the company can simplify and scale the analytics process, which will allow the organization to investigate more business problems in a shorter period.

My colleagues hate this because it puts into question the value of their work. There are times that a true data scientist is what is needed. However, using machine learning methods that can be explained not by a mad scientist, but by a business user, often allows companies to gain more wisdom and value from their models. They model is slightly less accurate, but provides insights to the company.

The goal is to create a process whereby the time from collecting data to the first version of the model is never longer than a couple of weeks.

It is time for the data science world to adopt the same, agile methodologies of the software industry.

Data science models should be developed in smaller steps, using more automation, and involving the subject matter experts in the process – or even letting him do the work with tools like DMWay!

I used to not like it when my data scientist colleagues called me the black sheep of the industry. Now I celebrate it. After all, it’s led not only me, but my business, DMWay, to stand out in the crowd.