Hadoop on Windows

This blog post is a preview of the content in that article (you'll find 3-5 times more information in the TNWiki article). The article (and many others about Hadoop) is written by Wesley McSwain, SQL Server technical writer.

Hadoop Overview

Apache Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It consists of two primary components: Hadoop Distributed File System (HDFS), a reliable and distributed data storage, and MapReduce, a parallel and distributed processing system. A Hadoop cluster can be made up of a single node or thousands.

HDFS is the primary distributed storage used by Hadoop applications. As you load data into a Hadoop cluster, HDFS splits up the data into blocks/chunks and creates multiple replicas of blocks and distributes them across the nodes of the cluster to enable reliable and extremely rapid computations.

Getting Started with Hadoop-based Services for Windows

The links in this section provide information on deploying Apache Hadoop to Microsoft Windows Platforms. All these articles are on TechNet Wiki: