Chapter 9. Setting Up a Hadoop Cluster

This chapter explains how to set up Hadoop to run on a cluster of
machines. Running HDFS and MapReduce on a single machine is great for
learning about these systems, but to do useful work they need to run on
multiple nodes.

There are a few options when it comes to getting a Hadoop cluster,
from building your own, to running on rented hardware or using an offering
that provides Hadoop as a service in the cloud. This chapter and the next
give you enough information to set up and operate your own cluster, but even
if you are using a Hadoop service in which a lot of the routine maintenance
is done for you, these chapters still offer valuable information about how
Hadoop works from an operations point of view.

Cluster Specification

Hadoop is designed to run on commodity hardware. That means
that you are not tied to expensive, proprietary offerings from a single
vendor; rather, you can choose standardized, commonly available hardware
from any of a large range of vendors to build your cluster.

“Commodity” does not mean “low-end.” Low-end machines often have cheap components, which have higher failure rates than more expensive (but still commodity-class) machines. When you are operating tens, hundreds, or thousands of machines, cheap components turn out to be a false economy, as the higher failure rate incurs a greater maintenance cost. On the other hand, large database-class machines are not recommended either, since they don’t score well on the price/performance ...

The best content for your career. Discover unlimited learning
on demand for around $1/day.