Month: March 2018

Build and Install Hadoop 2.x or newer on Windows

1. Introduction

Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward.

Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

2. Building Hadoop Core for Windows

2.1. Choose target OS version

The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs. We have not tested on Windows XP or any earlier versions of Windows and these are not likely to work. Any issues reported on Windows XP or earlier will be closed as Invalid.

Do not attempt to run the installation from within Cygwin. Cygwin is neither required nor supported.

2.2. Choose Java Version and set JAVA_HOME

Oracle JDK versions 1.7 and 1.6 have been tested by the Hadoop developers and are known to work.

Make sure that JAVA_HOME is set in your environment and does not contain any spaces. If your default Java installation directory has spaces then you must use the Windows 8.3 Pathname instead e.g. c:\Progra~1\Java\… instead of c:\Program Files\Java\….

2.3. Getting Hadoop sources

The current stable release as of August 2014 is 2.5. The source distribution can be retrieved from the ASF download server or using subversion or git.

Git repository URL: git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 using git checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version.

Build and Install Hadoop 2.x or newer on Windows

1. Introduction

Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward.

Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

2. Building Hadoop Core for Windows

2.1. Choose target OS version

The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs. We have not tested on Windows XP or any earlier versions of Windows and these are not likely to work. Any issues reported on Windows XP or earlier will be closed as Invalid.

Do not attempt to run the installation from within Cygwin. Cygwin is neither required nor supported.

2.2. Choose Java Version and set JAVA_HOME

Oracle JDK versions 1.7 and 1.6 have been tested by the Hadoop developers and are known to work.

Make sure that JAVA_HOME is set in your environment and does not contain any spaces. If your default Java installation directory has spaces then you must use the Windows 8.3 Pathname instead e.g. c:\Progra~1\Java\… instead of c:\Program Files\Java\….

2.3. Getting Hadoop sources

The current stable release as of August 2014 is 2.5. The source distribution can be retrieved from the ASF download server or using subversion or git.

Git repository URL: git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 using git checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version.

Build and Install Hadoop 2.x or newer on Windows

1. Introduction

Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward.

Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

2. Building Hadoop Core for Windows

2.1. Choose target OS version

The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs. We have not tested on Windows XP or any earlier versions of Windows and these are not likely to work. Any issues reported on Windows XP or earlier will be closed as Invalid.

Do not attempt to run the installation from within Cygwin. Cygwin is neither required nor supported.

2.2. Choose Java Version and set JAVA_HOME

Oracle JDK versions 1.7 and 1.6 have been tested by the Hadoop developers and are known to work.

Make sure that JAVA_HOME is set in your environment and does not contain any spaces. If your default Java installation directory has spaces then you must use the Windows 8.3 Pathname instead e.g. c:\Progra~1\Java\… instead of c:\Program Files\Java\….

2.3. Getting Hadoop sources

The current stable release as of August 2014 is 2.5. The source distribution can be retrieved from the ASF download server or using subversion or git.

Git repository URL: git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 using git checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version.

Build and Install Hadoop 2.x or newer on Windows

1. Introduction

Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward.

Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

2. Building Hadoop Core for Windows

2.1. Choose target OS version

The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs. We have not tested on Windows XP or any earlier versions of Windows and these are not likely to work. Any issues reported on Windows XP or earlier will be closed as Invalid.

Do not attempt to run the installation from within Cygwin. Cygwin is neither required nor supported.

2.2. Choose Java Version and set JAVA_HOME

Oracle JDK versions 1.7 and 1.6 have been tested by the Hadoop developers and are known to work.

Make sure that JAVA_HOME is set in your environment and does not contain any spaces. If your default Java installation directory has spaces then you must use the Windows 8.3 Pathname instead e.g. c:\Progra~1\Java\… instead of c:\Program Files\Java\….

2.3. Getting Hadoop sources

The current stable release as of August 2014 is 2.5. The source distribution can be retrieved from the ASF download server or using subversion or git.

Git repository URL: git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 using git checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version.

Build and Install Hadoop 2.x or newer on Windows

1. Introduction

Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward.

Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

2. Building Hadoop Core for Windows

2.1. Choose target OS version

The Hadoop developers have usedWindows Server 2008andWindows Server 2008 R2during development and testing.Windows VistaandWindows 7are also likely to work because of the Win32 API similarities with the respective server SKUs. We havenottested on Windows XP or any earlier versions of Windows and these are not likely to work. Any issues reported on Windows XP or earlier will be closed asInvalid.

Do notattempt to run the installation from withinCygwin. Cygwin is neither required nor supported.

2.2. Choose Java Version and set JAVA_HOME

Oracle JDK versions1.7and1.6have been tested by the Hadoop developers and are known to work.

Make sure thatJAVA_HOMEis set in your environment and does not contain any spaces. If your default Java installation directory has spaces then you must use theWindows 8.3 Pathnameinstead e.g.c:\Progra~1\Java\…instead ofc:\Program Files\Java\….

2.3. Getting Hadoop sources

The current stable release as of August 2014 is 2.5. The source distribution can be retrieved from the ASF download server or using subversion or git.

Git repository URL:git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 usinggit checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version.

2.4. Installing Dependencies and Setting up Environment for Building

TheBUILDING.txtfile in the root of the source tree has detailed information on the list of requirements and how to install them. It also includes information on setting up the environment and a few quirks that are specific to Windows. It is strongly recommended that you read and understand it before proceeding.

2.5. A few words on Native IO support

Hadoop on Linux includes optional Native IO support. However Native IO is mandatory on Windows and without it you will not be able to get your installation working. You must follow all the instructions from BUILDING.txt to ensure that Native IO support is built correctly.

2.6. Build and Copy the Package files

To build a binary distribution run the following command from the root of the source tree.

mvn package -Pdist,native-win -DskipTests -Dtar

Note that this command must be run from aWindows SDK command promptas documented in BUILDING.txt. A successful build generates a binary hadoop.tar.gzpackage inhadoop-dist\target\.

The Hadoop version is present in the package file name. If you are targeting a different version then the package name will be different.

2.7. Installation

Pick a target directory for installing the package. We usec:\deployas an example. Extract the tar.gz file (e.g.hadoop-2.5.0.tar.gz) underc:\deploy. This will yield a directory structure like the following. If installing a multi-node cluster, then repeat this step on every node.

3. Starting a Single Node (pseudo-distributed) Cluster

This section describes the absolute minimum configuration required to start a Single Node (pseudo-distributed) cluster and also run an exampleMapReducejob.

3.1. Example HDFS Configuration

Before you can start the Hadoop Daemons you will need to make a few edits to configuration files. The configuration file templates will all be found inc:\deploy\etc\hadoop, assuming your installation directory isc:\deploy.

First edit the filehadoop-env.cmdto add the following lines near the end of the file.

set HADOOP_PREFIX=c:\deploy set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop set YARN_CONF_DIR=%HADOOP_CONF_DIR% set PATH=%PATH%;%HADOOP_PREFIX%\bin

Edit or create the filecore-site.xmland make sure it has the following configuration key:

4. Multi-Node cluster

TODO: Document this

5. Conclusion

5.1. Caveats

The following features are yet to be implemented for Windows.

Hadoop Security

Short-circuit reads

5.2. Questions?

If you have any questions you can request help from theHadoop mailing lists. For help with building Hadoop on Windows, send mail tocommon-dev@hadoop.apache.org. For all other questions send email touser@hadoop.apache.org. Subscribe/unsubscribe information is included on the linked webpage. Please note that the mailing lists are monitored by volunteers.