The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.

The Apache Hadoop software library can detect and handle failures at the application layer, so it can deliver a highly-available service on top of a cluster of computers, each of which may be prone to failures.

As listed on the The Apache Hadoop project Web site, the Apache Hadoop project includes a number of subprojects, including:

Hadoop Common: The common utilities that support the other Hadoop subprojects.