IT Infrastructuur Architectuur Blog

Sjaak Laan's visie op infrastructuur architectuur

What is Big Data?

According to IBM, every day 2.5 quintillion bytes of data are created, and 90% of the world’s data is created in the past two years. Because of the popularity of sensors, mobile telephony, surveillance cameras, RFID tags, social networks, digital photographs and video, etc. the amount of generated data getting larger each day.

The annual world's effective capacity to exchange information through telecommunication networks is shown in the next picture, where one Exabyte is one billion Gigabytes.

Figure: Big Data

Most of this data is unstructured, meaning that the data is not stored in databases, but in emails, text documents, spreadsheets, etc.. To make efficient use of this data is quite a challenge. In some cases, the amount of data coming into an organization is too large to even store.

Big data is about the search, processing and storage of data that is increasing in volume, velocity (the speed at which data is transported through a system), and variety (the types of data) – also known as the three Vs.

One example is the LOFAR telescope, that generates an enormous amount of data each second. Too much to store it on disks. So the only way to process the data is when it is still in transit. The live data stream is processed and only the result of this processing – that is much smaller in size – is actually stored on disk to be analyzed.

New infrastructure solutions are needed to cope with big data, including high speed networks, fast processing nodes, and specialized storage.