What is Big Data

As with all new terminology, a whole range of definitions has been made to describe Big Data (for a great analysis on a number of those, we refer to Gil Press’ blog at Forbes). In 2001 the Meta Group already distinguished Big Data using the 3 V’s: Volume, Variety and Velocity. Today those have been expanded up to seven V’s, including Viscosity, Virality, Veracity and Value:

Volume – relates to the greater size of the data set and mainly the processing ability of this data. Data generation as well as processing have been growing exponentially; research has shown that 90% of the world’s data has been generated in the past 2 years.

Variety – refers to the large variety of data that is being generated today. This includes many ‘new’ forms of data from social, machine-to-machine communication, and mobile apparatus (Internet of Things), most of which traditional databases cannot yet process and analyse.

Velocity – relates to the greater speed at which data is generated (often real-time), as well as the temporary value of the data.

Viscosity – refers to the inertia when navigating through a data collection. For example due to the variety of data sources, the velocity of data flows and the complexity of the required processing.

Virality – measures the speed at which data can spread through a network.

Veracity – relates to the quality and origin of the data to determine whether it is trustworthy, conflicting or impure.

Value – refers to the value that could be extracted from certain data and how Big Data techniques could increase this value.

So in short, as Gartner defines: “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”

Data Science

But when referring to ‘Big Data’ people don’t necessarily mean the data itself. The term also gets used a lot in the context of ‘data science’, the field which specialises in the techniques needed to process this data. This ranges from the generation and storage of the data to sophisticated analytics techniques and visualization to distill a number, graph or similar piece of information which can easily be interpreted and/or implemented.

Data science is a mix of several aspects from three fields:

Mathematics/Statistics

All of the models and algorithms used to extract information from data rely on a mathematical and statistical foundation. Things like predictive analytics, statistical modelling and machine learning require knowledge of advanced mathematical and/or statistical concepts.

Computer Science

When practising the art of data science one needs a computer. Processing the huge amounts of data available these days requires knowledge of the design of software and hardware in computers.

Business

A data scientist needs to understand the market where the data originates from in order to come up with results that will be applicable to this market. Besides the market, understanding the company itself is needed in order to communicate your findings.

Working as a Data Scientist

Your expertise in working with big data will allow any type of organisation to gain new insights and make informed decisions. The need for Data Scientist who can help organisations in those jobs is high. To give you a clear picture of the opportunities, here are some examples of data science working in the real world:

Business: The revenue management system in the leisure sector is completely based on algorithms. Using all kinds of data sources – your own surfing behaviour, the weather, the popularity of a flight or hotel room – prices can differ every minute.

Law: An A.I. lawyer can already take over a part of the research work of human lawyers. It reads and scans juridical contracts, for instance to find deviations in details.

History: Computers are now smart enough to understand large portions of newspaper articles, in order to help put big events into historical perspective.

Smart Cities: City analytics are developing fast, for example in the field of image recognition via camera use.

Security: Datasets are used to analyse big ‘marijuana networks’ in the Netherlands, resulting in some interesting findings that would never come up without data scientists. For example, that it is better for the police to focus on the technology-savvy people in these networks, instead of the big bosses.