Balancing data across an
HDFS cluster

The HDFS Balancer is a tool for balancing the data across the storage devices of a
HDFS cluster.

You can also specify the
source DataNodes, to free up the spaces in particular DataNodes. You can use a block
distribution application to pin its block replicas to particular DataNodes so that the
pinned replicas are not moved for cluster balancing.

Why HDFS data Becomes unbalancedFactors such as addition of DataNodes, block allocation in HDFS, and behavior of the client application can lead to the data stored in HDFS clusters becoming unbalanced.