Myths and realities of BigData

Myths and realities of BigData – exported from Medium

I often get asked about “BigData”… not just from people with technology background, but increasing from people who are not from technology. There is a lot of curiosity about this topic and often a lot of myths. So, I decided to write this for a non-technical audience. If you are a techie, you will most likely find this quite elementary. Here it goes…

Let’s do a little thought experiment?—?you go to a mall on a busy weekend; it’s very crowded, you visit multiple shops, browse through a few racks of clothes, try out a few of the outfits, put items in your shopping basket, stand in a long checkout queue, after waiting in the line for ten minutes, you abandon the checkout. Then you go and have a meal at the food court, on the way bumping into a few friends of yours and exchange a few pleasantries.

It will be virtually impossible for anyone to keep track of yours or hundreds of other people’s activities at the mall. In a way, you are mostly invisible. Contrast that experience with that of the digital world, where you go through a few websites (or apps) on an above mentioned shopping journey. It’s quite likely that each of your activities and actions are recorded and logged (often multiple times with multiple entities?—?e.g. your ISP may record the web page that you visit, so may be the website owner). You leave a huge digital trail and footprint behind you. People are producing and consuming vast amounts of information?—?from social media interactions, news and entertainment to shopping.

It’s not just us humans who are producing and consuming vast amounts of information?—?increasingly machines of various kinds are connected to the network and collect huge volumes of data. These are called Internet of things (IoT). One example of IOT is vehicle sensors which collect information such as speed, GPS location, engine state and a host of other parameters from the vehicle and periodically send it back to the network for further processing and analysis. Another example is “smart wearable wristbands” that are used by some of the amusement parks to track the visitor activity and movement in the park. By one estimate, there are already more IoTs than human beings on the planet and the number of such devices is growing rapidly.

All such digital activity produces a mind boggling amount of data?—?BigData. Data volume has grown over 100 times over the last decade and it is expected that the data produced will continue to double every two years over the next decade. By 2020 the digital universe will have as many digital bits as there are stars in our physical universe!

There are many reasons for such an unprecedented growth in data. Storage costs have dropped exponentially over the last few decades. Computing power has kept pace with Moore’s law and has continued to grow exponentially over the same period time. Network speed has also increased exponentially and the cost of data transmission has also dropped steeply. Another reason is the miniaturization of devices (e.g. smartphones in our pockets are vastly more powerful than a computer that filled up half a room few generations ago. Similar size shrinkage has happened for data storage devices). Also, innovative software and hardware products have made consumers adopt such products at a rapid pace.

It’s a reality that the world is generating BigData, but it’s a myth that BigData is automatically leading to big intelligence. It’s very easy to produce a vast amounts of data, but very hard to derive meaningful insights from it. Given the limitation of computing power, fifty years ago data generation/storage and analysis focused mostly on structured data (structured data is any data that is pre-coded such as a sales transactions that lists items, quantity and sales amount. Typically, structured data is stored in relational databases). It’s easier to process structured data because the meaning of each of the data elements is known. However at present, the volume of unstructured data is significantly larger than structured data and unstructured data is also growing at a faster clip. Some examples of unstructured data include blogs, chat messages, videos, emails etc.

With the advances in computing technologies, over the last few decades we have made tremendous progress in our ability to make sense of unstructured data. Often, unstructured data processing requires a good knowledge of the problem domain, vast computing power, sophisticated algorithms, complex mathematics, heuristics and machine learning (ML). ML is a field in artificial intelligence where machines or programs learn with experience and data. ML systems work with data and derive patterns based on features available in data. As a simple example, say that there is data available for a given city on its weather, traffic, accidents etc. A ML system with a given data sample may figure out that there is astrong correlation between rain and traffic jams and may be able to predict effect on traffic given the impending forecast of rain.

Data intelligence is still a very difficult realm. At times, it is easier to answer “what happened?” kind of questions. “Why it happened?” and “What can we predict with the data?” are much harder questions to order even with structured data. For example, even the most sophisticated financial models could not see the 2008 financial crisis coming. That’s the reality of BigData analysis and an opportunity for huge improvements. Not surprisingly, working to improve “data intelligence” is becoming a major focus and a competitive edge for corporations. Customers and companies are increasingly interacting only in the digital or virtual world. In such cases, BigData and BigData analysis become a company’s eyes and ears. Any company that uses “data intelligence” to understand thecustomer’s intent and emotions would be able to serve a customer that much better and win the customer’s loyalty.

We are at a fascinating juncture of time?—?our ability to make sense of data is getting better. Advances in computing technologies will only accelerate the “data intelligence” leading to a better and smarter world.