Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

3.
Big Data Definition

Big data is used to describe a massive volume of both
structured and unstructured data that is so large that it's
difficult to process using traditional database and
software techniques.

In most enterprise scenarios the data is too big or it
moves too fast or it exceeds current processing capacity.

The term big data is believed to have originated with
Web search companies who had to query very large
distributed aggregations of loosely-structured data.

4.
An Example of Big Data

An example of big data might be petabytes (1,024
terabytes) or exabytes (1,024 petabytes) of data
consisting of billions to trillions of records of millions of
people—all from different sources (e.g. Web, sales,
customer contact center, social media, mobile data and
so on). The data is typically loosely structured data that
is often incomplete and inaccessible.

When dealing with larger datasets, organizations face
difficulties in being able to create, manipulate, and
manage big data. Big data is particularly a problem in
business analytics because standard tools and
procedures are not designed to search and analyze
massive datasets.

11.
Some Challenges in Big Data




While big data can yield extremely useful information, it
also presents new challenges with respect to :
How much data to store ?
How much this will cost ?
Whether the data will be secure ? and
How long it must be maintained ?

12.
Implementation of Big Data
Platforms for Large-scale Data Analysis :
 The Apache Software Foundations' Java-based Hadoop
programming framework that can run applications on
systems with thousands of nodes; and
 The MapReduce software framework, which consists of
a Map function that distributes work to different nodes
and a Reduce function that gathers results and resolves
them into a single value.