5.
What is Apache Hadoop ?
• Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports
running applications on large clusters of commodity hardware.
• Hadoop are designed with a fundamental assumption that hardware failures (of
individual machines, or racks of machines) are common and thus should be
automatically handled in software by the framework.
• Apache Hadoop's MapReduce and HDFS components originally derived
respectively from Google's MapReduce and Google File System (GFS) papers.
Presenter: Prem Chand Mali, Mindfire Solutions

6.
What is Apache Hadoop ?
• The Apache Hadoop framework is composed of the following modules :
– Hadoop Distributed File System (HDFS) - a distributed file-system that stores
data on the commodity machines, providing very high aggregate bandwidth
across the cluster.
– Hadoop MapReduce - a programming model for large scale data processing.
– Hadoop Common - contains libraries and utilities needed by other Hadoop
modules
– Hadoop YARN - a resource-management platform responsible for managing
compute resources in clusters and using them for scheduling of users'
applications.
Presenter: Prem Chand Mali, Mindfire Solutions