Big Data Resources

Overview

Apache Hadoop project -- A project that contains libraries that allows for the distributed processing of large data sets across clusters of computers using simple programming models. There are several modules, including the Hadoop Distributed File System (HDFS), which is a distributed file system that provides high-throughput access to application data and Hadoop MapReduce, which is a key algorithm to distribute work around a cluster.