Configuring Storage Balancing for DataNodes

You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.

By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account
how much space is available on each volume when deciding where to place a new replica.

You can configure

how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and

what percentage of new block allocations will be sent to volumes with more available disk space than others.

What proportion of new block allocations will be sent to volumes with more available disk space than others. The
allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block
allocations.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.