3.
What is Big Data?
Big data is the term for a collection of data
sets so large and complex that it becomes
difficult to process using on-hand database
management tools or traditional data
processing applications.
www.easylearning.guru

4.
Types of Big Data ?
Traditional RDBMS deals
with only Structured data.
Semi-Structured
Data
Need of a technology which deals with
Semi-structured data, Unstructured
data and Structured data as well
www.easylearning.guru

6.
Sources of Data
Social Media & Networks
(All of us are generating data)
Mobile Devices
(Tracking all the objects all the time)
Sensor Technology & Networks
(Measuring all kinds of data)
Scientific Instruments
(Collecting all sorts of data)
www.easylearning.guru

13.
What is Hadoop ?
Hadoop was created by Doug Cutting and Mike Cafarella.
Hadoop provides the reliable shared storage and analysis
system.
It is designed to scale up from a single server to thousand of
machines, with a high degree of fault tolerance.
www.easylearning.guru

15.
Hadoop Core Components
Core Hadoop has two main systems:
• Hadoop Distributed File System: The Hadoop file system is a
Distributed file system which holds the large amount of data across
multiple nodes in a cluster.
• MapReduce: MapReduce is a distributed programming paradigm
used to analyze the data in the HDFS.
www.easylearning.guru

16.
Hadoop Distributed File System (HDFS)
A given file is broken down into blocks (default=64MB), then blocks are
replicated across cluster (default=3).
Optimized for throughput.
HDFS allows you to put/get/delete files.
Follows the philosophy
͞Write OŶce aŶd Read Multiple tiŵes͟
Block Replication for:
- Durability, High Availability and Throughput.
www.easylearning.guru