A call to all Data Engineers and BI workers

In the last two years I had the chance to get my hands on some very exciting Data Analytics projects and I wanted to take the chance to recap and to reach out to Data Engineers and BI Consultants. Why?

In the area of IT we see lot’s of trends coming up every year. Some are going, some are staying, but sometimes we see also a paradigm shift. These shifts have a tremendous impact on the way we worked before and how we will work in the future, for example the rise of the Internet, the area of Business Intelligence and Data Warehouse and the whole E-Commerce shift. And now we can see new shift coming up:

The era of Big Data

The difficult thing with a paradigm shift is we need to rethink certain ideas, the way we did business before and we will do in the future. And if we don’t do it, others will do it and we will not be as successful as we have been in the past. So let me get to that story in more detail.

Big Data vs. Data Analytics

Big Data is now out there for a while and people already understand that storing large amount of data is not good enough. There was a big hype about it and we are now at a point that the words “Big Data” already got a negative touch. It’s very exciting to see the big progress in new technologies like Hadoop, where customers can store nearly all their data. But in 90% of the cases it is totally useless to throw all your data into a data analysis problem. Just talking about technologies does not meet the needs of users and customers anymore.

I also don’t like to talk about Big Data, because it’s misleading, instead I’d like to talk about Data Analytics and that’s what it’s all about. So the focus is clearly on analyzing data and creating value out of it. This is also not big news, but we were told that only a specific type of people with a specific knowledge can do this: Data Scientists.

These guys are currently seen as heroes in the analytics market and everybody is looking for someone with little or no luck. So here’s my point: Analyzing data is not a new venture, in the area of Business Intelligence and Data Mining people did this all the time for years. But what has changed and where does the big shift happens?

We can clearly say that we can’t get around Data Analytics anymore. If you talk with customers and you just want to talk about Data Warehouses and BI you are missing half of the discussion. All the companies I talk to clearly think about Big Data or Data Analytics and how they can combine it with their Data Warehouse and BI solutions. But technology has become secondary in these discussions. Don’t get me wrong, Data Warehouses are still necessary and in use but the focus clearly has changed. We see new types of data that are interesting to analyze like streaming, social, logs, sensor data and there are also new ways to analyze data like pattern recognition, predictions, clustering, recommendations, etc. So the operational data that is typically stored in Data Warehouses is still necessary, but it has to be combined with the other types of data, I mentioned before. But in today’s discussions with customer, it’s all about use cases and solutions.

And in order to close the loop let me quickly come back to the Data Scientists. I agree that we need statistical and mathematical skills to solve problems like customer segmentation, next best offers and recommendations, predictions, data correlations etc. but we need much more skills to provide whole solutions to customers, so a good team mix is much more important.

New skills and approaches

With the era of Big Data and the new analytical possibilities we can also see new solution approaches. Data Analytic projects are much more iterative and evolutionary because research on your data is a big part of your work. Companies discover new use cases and sometimes they change their whole business model, because they find competitive advantages or new possibilities for revenue.

A good example for this are Smart Homes. We can see that the digitalization is now arriving at our homes. In the near future devices in our home are fully connected with each other and share data between each other. When I set my weak up alarm for the next morning, an app will tell this to my heating system. My heating system then knows when I want to take a shower and need warm water or when I want to drive with my electric car.

Energy providers are highly interested in this information and in my daily behavior of energy consumption. Why?

Because when they better understand my energy consumption, they can better predict their energy sales and also how much energy is consumed at a certain time. And when they better predict the energy consumption of their customers, they can better handle their purchase of power energy at the energy exchange market.

The challenge with these new business models, and there are plenty of others, is that they are new. And for energy companies that have offered power supply in a very classical way for decades, this is a big change. So that’s why also technology providers like Google enter the market. They know how to handle the data, how to analyze it and how to use it for business models to provide additional services. Should you not accept these changes in business models, even when they take some time before they settle on the market, you wake up, when it is too late. Because applying changes need some time and companies need the experience in order to apply these changes step by step

And I think this is the most important learning in the last years. You can stick with you old business models if they work, but if an industry is changing you need to adapt. And Data Analytics happens in several industries and the most successful companies are those, that start small, get their experiences very quickly and are able to adopt the changes. There are very good examples in Germany like the Otto Group in Retail, Yello Strom in the Energy sector and also some new Startups.

As I mentioned before Data Analytic projects need to be very iterative in their approach. A lot of projects start with an idea or a use case or a feeling and we need to quickly understand if there is a business case behind it or not. In order to support those projects we need a different approach, which I call “Laboratory and Factory”.

The Laboratory

The Laboratory is for experiments. Here we can test all our use cases, ideas or just discover patterns in data. The important thing is, it must be cheap. We don’t want to spend much money on experiments, if we don’t know the business case behind them yet. The work can be compared to „panning for gold“. There is plenty of gold to be found, but for every gold nugget a multiple of sand needs to be panned. So from a technology perspective I would use whatever fits to the use case. In the laboratory we should be more flexible on different technologies like SQL, Hadoop, Storm, Pig, R or Python, D3 or other tools which help solve our problems.

From a data perspective we can work on a subset of the data. What we probably want to avoid is data sampling, which is often times very time consuming, so we prefer real data first. So the main goal of the laboratory is to…

The Factory

After we proved our business cases in the laboratory we can than apply our data analytic models to the factory. The factory means that we operate these models on a daily base and couple them to our business processes. Here we typically use the existing enterprise platforms and we often see mixed solutions of classical data analytics platforms combined with Open Source technologies. A new requirement in the last years is that we want to apply our analytical models to the whole data history. Technologies and servers are now capable to make this possible. So our factory gives us the integration of our analytical business models to our daily business at enterprise scale, on the whole data set and probably enriched with external data.

New technologies

Some month ago I had the chance to visit the European Hadoop conference in Amsterdam and it was a great chance to get another view on Big Data and Data Analytics from an Open Source perspective. It has become very obvious that Hadoop and NoSQL based technologies drive the Big Data and Analytics market. There is a whole industry behind companies like Cloudera, Hortonworks, MapR and others that push new innovations. The Stinger initiative for example was a team project of around 45 companies with 140 developers that improved the performance of the Hive technology by a factor 100 within 1 year. Imagine the power of innovation that these companies like Google, Yahoo, Facebook, Hortonworks and also Microsoft can bring to these technologies when they combine the skills. Clearly when you come from a traditional BI solution like SQL Server, Teradata, Oracle or SAP you would say that there are still some gaps in the usability and ease of use. But on the other side these technologies are built just for Big Data solutions. They offer fantastic capabilities and some of them are great technologies, also if it is sometimes harder to work with them.

And when you see that all big players in the market like IBM, Oracle, Teradata, Microsoft and SAP have partnerships with Hadoop platform providers, then it is very clear, that there is no way around these technologies anymore. It is just a question how to combine them best. Microsoft for example has a nice offering with the Analytical platform system (APS), which is a scale-out box where you can mix Hadoop and SQL Server in a highly parallel and very high performing way.

Summary

I personally believe in the new paradigm shift of Big Data and Data Analytics. I already had the chance to enjoy several projects in that area and I’m very happy to start on new ones in the next weeks. But that is also the reason why I wanted to write this article. In order to stay competitive we need to accept changes in the market and start to deal with them. What does that mean?

We have to keep learning new technologies, different approaches, new business models, etc. Old traditional BI projects will be also done in the future but the really interesting and challenging projects will all deal with Data Analytics. There are lots of really fascinating use cases, which due to non-disclosure agreements I can’t talk in more detail about them. But what I can say is that a little bit of ETL, SQL and building a Data Warehouse and some reports is not good enough anymore. Technologies these days can do much more and customers are starting to understand this. The demand and the expectation is increasing especially for analytical models in combination with business process optimizations. So the time is very exciting and I can encourage everybody to get started and if you don’t know where, let me know…

Like this:

Related

Post navigation

5 thoughts on “A call to all Data Engineers and BI workers”

Yes, most data science algorithms aren’t new – they were first properly developed and tested on computers in late 20th century. In some cases new innovation is needed e.g. porting algorithms to Hadoop isn’t easy.

But where the hype seems to be is in that many businesses that didn’t use BI/analytics before can do so because of scalability. The answer to why is that press and people have a way of ignoring reality e.g. reality that cost savings in BI are wiped out by data engineer and scientist costs.

But also participants are at times confused by IT – I know many server experts that naively think all IT jobs are like theirs, and data scientists that think they can just math their way through their professional life. The overall structure is misunderstood – my background in banking even modellers are expected to understand the business and the business is expected to understand the simple way tech drives business. Why the fantasies? People rarely do more than one of these roles.