What is Big Data?

Most of us, who are in sync with the current technological trends for professional or informational reasons, have of late heard a lot of buzz around Big Data. And every techie worth his money is talking about this new development in the analytics field. But, what exactly is Big Data and why is it so important?

Taking an example of a common activity that most of us do, shopping online.

Flipkart Online Shopping Portal

We visit a site, browse through articles and finally choosing and adding to a cart. After which we proceed to payment. Before this, the site requests to login for an easier checkout. And, when we do this we give our details like age, location, and many other seemingly inconsequential details. All this data is collected somewhere including what was clicked on, what was added in cart, and what was eventually bought.

Now, THIS data is in huge volumes in real time and it is possible to do a lot of analysis on such accumulated data. Knowing which age groups shop frequently, which location is most active for online shopping, and many other conclusions can be received by doing analysis on that data, eventually improving customer service – like giving curated items on sale, offering discounts on products which you are likely to buy. Because this data quantity is massive and fast changing, it is called as Big Data. The data is to the tune of hundreds of terabytes. You would be amazed to know that Facebook generates 500 terabytes of data every day! 1

Data collected from various devices

But this is just one common consumer field we know of. There is a huge data volume generated in the industrial sector, which can be analysed to give better efficiency and throughput. Let us take the case of one of the leaders in Big Data initiatives.

General Electric (GE) is known for a variety of products in the consumer markets, electrical/electronic goods, medical equipment, etc. But, mainly they are a leading manufacturer of jet engines, power plants, and locomotives. They see a huge potential in using the Big Data technology to help in better customer services.

A General Electric Jet Engine

GE made an intelligent move at the beginning of the Big Data trend by bringing in William Ruh from Cisco Systems in late 2011 to lead their Big Data centre in San Ramon, California, to build GE’s version of an industrial internet. In 2012, GE CEO Jeffrey Immelt announced that the company would commit $1 billion to this analytics and software centre over a period of four years.

GE receives tons of data from their sensors embedded in their jet engines, turbines, trains, and medical equipment. They believe analysing all this data can enable to help their customers with better after- sales services – to identify maintenance problems before they occur, to improve fuel efficiency, and to better other operating aspects of their products.

To put things into perspective, Ruh says the data that gets generated from a single GE gas turbine is 500 gigabytes. With 12,000 such gas turbines in operations, the amount of data is not just huge – its ginormous!

The amount of data collected from a jet engine!

Paving the way for remarkable changes in the way the industrial sector can progress not just for themselves but also for their customers. GE estimates conservative savings from such an industrial Internet in five of their revenue generating sectors (aviation, power, healthcare, rail, and oil and gas) alone to be approximately $300 billion in the next 15 years.

Equating it to what the consumer Internet did for the consumer market, GE envisions a similar revolution in the industrial sector – a foundational change, with the introduction of an industrial Internet. In anticipation of this future, GE has centralized all their Big Data and Analytics work to the San Ramon centre. The centralized staff has risen to 300 from just two people (Ruh and his executive assistant), and who’s pioneering work would support another 9000 GE software engineers operating globally in various product businesses.

Benefits of centralizing the Big Data and Analytics center

Ruh says that the need to centralize the “hard-core data scientists” boils down to three factors –

Severe talent shortage. The number of people who are experts in in-depth data science and deep analytics capabilities are very few.

If they want their data scientists to be retained for a longer period, then they need to provide a career path with clear leadership programs based on capabilities.

Technology reusability. It is not possible to build high-end capabilities by different groups every time. Therefore, it is imperative to be able to reuse these scarce resources.

Thus, GE believes that they can bring a fresh portfolio of offerings. To be at the top of the pyramid, the advent of the industrial Internet must help their customers tap GE’s Big Data expertise and generate significant savings and improve how they manage, operate and maintain their machines.