Making Sense Out Of Big Data

Data are the crude oil of the 21st century. But how to make sense out of petabytes of information? Alexander Thamm has set himself up as Europe’s first Data Science service provider. In a recent interview for our new magazine Smart Industry – the IoT Business Magazine, I asked him what that means.

? What does a Data Science Consultant do all day long?

You can’t describe a typical day of a data scientist consultant, as the tasks vary and depend on the phase of the project. In the Business Processes stage, the key business questions need to be identified. The Data Intelligence phase includes steps such as extraction and preparation as well as evaluation of the data. For Predictive Analytics, we develop a model for statistical analysis. The final results are then visualized using an interactive dashboard which enables the customer to examine and evaluate the outcomes from different viewpoints. A good data scientist seeks to focus on one or two phases of the Data Compass throughout his or her career. However, to truly understand data science projects it is important to know all 4 steps.

? Who are your clients?

We have clients from many different industries and from all sorts of functions and departments. For marketing and sales, we are involved in optimizing the customer journey, defining the next best offer or calculating the best price. In production, predictive maintenance is a big topic as well as the Internet of Things. Controlling and management need smart reporting and visualization to save time and – even more important – to enable better decision-making. Most of our clients are large companies like BMW, Daimler, Vodafone, MunichRe, or EnBW. We often work in so-called data labs or centers of excellence. These are like a start-up within a large company. You work in a very innovative environment, you can experiment, and decisions are made quickly. Within a few weeks, sometimes just days, we realize a first prototype. After validating the prototype, we test the pilot. Certainly, a lab environment isn’t a necessary condition for data science projects, but in my experience it facilitates and accelerates the process.

? What about small and medium-sized companies: Do they seek your services, too?

Medium-sized companies have just started to discover the possibilities of data science and analytics. I think there is a huge potential, especially in Germany with its many “hidden champions”. Right now, for instance, we are working with a German tool maker, who wants to better understand his customers. Typical questions are: “How do my end customers actually use the product?” “How do my customers make a buying decision?”

? Who do you talk to when you’re with a client – management, IT, or both?

Our first contact is almost always with management. In my experience, the best and quickest progress is made if there is a strong commitment from the board to invest in big data and digitalization. A pragmatic Chief Analytics Officer or Chief Digital Officer who wants to go new ways and set a framework for the data strategy is also quite helpful. Our solutions need to fit into the available infrastructure, so it’s important to make sure the IT department is on board at an early stage.

? You seem very excited about something you call “Data Intelligence”. What does that mean?

There are many different types of data: you have numbers, meaning measured values or operating figures. Then there are binary variables that describe a client’s yes or no purchase decision. Then there is structured data, for example a model designation, or unstructured data like customer reviews. These different types of data have to be organized, bearing in mind that different problems come with different requirements regarding the data set structure. Put simply, we need to make sure the data all speak the same language. It is also very important to validate the data by asking ourselves: “Is the data plausible?” “Are some observations obviously wrong, perhaps due to coding error?” “Are important values missing?” What we call data intelligence is often the most intense phase of the project, and it’s where most of the work needs to be done. Good data quality is crucial for the actual analysis.

? Why, exactly, is a picture is worth a thousand numbers, as you maintain?

We need to clarify here: the right picture is worth a thousand numbers. But an interactive dashboard is even better than a picture. Studies have shown that people remember 80 percent of what they see, but just 20 percent of what they read and 10 percent of what they hear. Our brain can processes images 60,000 times faster than text.

? How do you use visuals to get your messages across?

Visualizations are especially helpful to communicate relations and results to the management – and to get their attention! We once presented a dashboard to the management of a client showing them their market position across different regions in Germany and across all age groups. The whole map was blue which meant they were the market leader. Only one region was grey because there their main competitor was ahead. Then we changed the picture to show the age group 18-29, and suddenly the whole map went grey! Our dashboard proved that they had a serious problem with young customers. Simply showing them a table with hundreds of figures wouldn’t have had the same dramatic effect.

? You believe that machine outages and downtime will eventually be a thing of the past. How so?

Though predictive maintenance. True to the motto “fix it before it fails”, predictive models can calculate the right time to schedule the maintenance of plant or machines in order to avoid unnecessary costs and machine failure. One of our clients is a global automobile manufacturer who wanted to identify defective vehicles before they actually break down. The goal was to reduce or even eliminate warranty costs. Thanks to our forecast model they are now able to identify two-thirds of all affected vehicles beforehand. The warranty costs went down by over 50 percent!

? Today, car owners need to drive to their local testing center regularly to have their vehicles checked. You think that one day, due to the Internet of Things and total connectivity, the car manufacturers will be able to certify automobiles remotely without any need for manual inspection?

Honestly, I think we are not very far away from that. Tesla can already update a car’s software via the Internet – for them it’s just one click. Today’s cars have loads of sensors on board to monitor things like tire pressure or engine temperature. Other sensors provide information on things like speed, distance, etc. By entering all this data into a predictive model, we can reliably identify cars where a problem is likely to occur soon. Remote certification of cars will be technically feasible in four or five years. However, it will take a lot longer to establish such a system in practice. You need to lots of long-term tests, not to mention legal changes and a new infrastructure.

? Banks are keen on gaining predictive insights on the credit-worthiness of people and businesses seeking loans. Aren’t you worried about possible privacy issues here?

First of all, it is important to note that both the bank and the customer profit from better solvency predictions. Take for example the emergency loan algorithm, which we developed for a German bank. The emergency loan is a kind of micro credit, usually for amounts of 100 or 200 Euros, with a one or two-month repayment term. In Germany, the most commonly used way to estimate creditworthiness is the score provided by SCHUFA, a credit rating company. In many cases, the target group for emergency credit already has a SCHUFA entry disqualifying them from further loans. Therefore, we need to examine other factors to determine if they are credit-worthy and likely to repay. Our algorithm considers all sorts of data, from personal credit histories to recent transactions and even social media activities. That way we can reduce credit losses by over 90 percent even though the granting rate remains the same. As for privacy, Germany has some of the toughest privacy laws in the world. We respect these laws in our projects, of course. As part of our data intelligence phase which I explained before, we check to make sure that data has been legally obtained and can be used for analysis purposes.

? In what other areas will predictive analysis revolutionize the way companies do business?

As a general rule, companies have to become more data-driven and establish a hypothesis-based decision-making culture. Business is becoming less plannable, and core products and services are being commoditized. Probably, the next disruptor business model is already knocking on your door. The ability to experiment with new ideas, prototype a viable product or process, and quickly learn from results, enables a whole new way of thinking. Predictive analytics, data science and leveraging value from data will be the big differentiator as enterprises head for the digital future. Predictive analytics not only helps them make better decisions; if done holistically, it allows them to really understand their customers in ways never seen before. By analyzing the customer journey, by predicting customer lifetime value, by recommending products for cross- and up-selling, and by preventing customer churn, manufacturers and retailers can dramatically increase customer engagement and satisfaction.

? You claim that you are able to analyze business processes, customer behavior and even mechanical properties of equipment and then translate the problem into tangible IT demands. How far do you go?

We are a data science consultancy, and we are independent from any single technology provider. Within the last four years, we have recommended over 50 different technology solutions, each tailored to fit the project needs of the client. Our expertise and practical experience with technologies like ETL [Extract, Transform, Load; basic data warehouse functions] or visualization tools is an important competitive advantage for us. Selecting and using the right tools is essential for successful data science projects. Most of our customers choose to rely on our expertise in the selection process. Knowing as many tools as we do, we can give them truly objective advice. What we are looking for is the best solution for our customer. There is no such thing as the “best” visualization tool. It always depends on the use case, the present IT infrastructure, the volume of data and many other factors.