Channels

Services

Apache Hadoop reaches its 1.0 milestone

The developers of Apache Hadoop have released version 1.0 of the MapReduce framework and distributed computing platform. Hadoop has become a popular approach to handling large data sets or complex processing thanks to its ability to coordinate the distributing of a task between thousands of computing nodes as smaller tasks (the "map") and bringing together the results from those nodes into a coherent result (the "reduce"); the data for processing is stored in a redundant fashion across all the nodes.

Hadoop 1.0 is based on the security branch of Hadoop 0.20, specifically Hadoop 0.20.205.0. The new 1.0 release is said to be stable and reliable at scale, with 50,000 node installations already deployed. HBase, the Hadoop "big data" database modelled after Google's Bigtable, is now integrated with the 1.0 release and the support also includes "performance enhanced access" to local files for the database. Security on the nodes is ensured with Kerberos-based authentication and the 1.0 release also supports webhdfs with a read/write HTTP access layer. The Hadoop 1.0 release notes detail all the changes, fixes and performance enhancements.

Hadoop was created by Doug Cutting at Yahoo to support the Nutch search engine and was inspired by Google's MapReduce and Google File System papers. In 2006 it was contributed to the Apache Software Foundation where it has become one of the major projects within the Foundation's ecosystem, fostering many subprojects of its own to address different aspects of the challenge of distributed computing. A number of companies have also been created around Hadoop such as CloudEra and Yahoo spin-off, Hortonworks. Among the companies that use and contribute to Hadoop projects are Facebook, StumbleUpon, eBay, LinkedIn, Twitter, IBM and Adobe.

A number of variants of Hadoop are concurrently in development. November's release of Hadoop 0.23, which introduced a new MapReduce implementation named YARN, is the current alpha version. Hadoop 0.22, released at the start of December, is another branch with a range of new features for the MapReduce platform but which lacks, among other things, security. The current stable version of Hadoop is 0.20.205.0, which was released in October. Information on all these releases and downloads for them are available on the Hadoop Common Releases page.