Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Introduction

I have searched on Google and found that Hadoop provides native Windows support from version 2.2 and above, but for that we need to build it on our own, as official Apache Hadoop releases do not provide native Windows binaries. So this tutorial aims to provide a step by step guide to Build Hadoop binary distribution from Hadoop source code on Windows OS. This article will also provide instructions to setup Java, Maven, and other required components. Apache Hadoop is an open source Java project, mainly used for distributed storage and large data processing. It is designed to scale horizontally on the go and to support distributed processing on multiple machines. You can find more about Hadoop at http://hadoop.apache.org/.

Check this video for more precise steps:

Solution for Spark Errors

Many of you may have tried running Spark on Windows OS and faced an error in the console (shown below). This is because your Hadoop distribution does not contains native binaries for Windows OS, as they are not included in the official Hadoop Distribution. So you need to build Hadoop from its source code on your Windows OS.

NOTE: We can use Windows 7 or later for building Hadoop. In my case I have used Windows Server 2008 R2.

NOTE: I have used a freshly installed OS and removed all .NET Framework and C++ Redistributables from the machine as they will be getting installed with Windows SDK 7.1. We are also going to install .Net Framework 4 in this tutorial. If you have any Visual Studio versions installed on your machine then this is likely to cause issues in the build process because of version mismatch of some .Net components and in some cases will not allow you to install Windows SDK 7.1.

Installation

A. JDK

2. Again leave installation path as default for JRE if it asks, click next to install, and then click close to finish the installation.

3. If you didn't change the path during installation, your Java installation path will be something like "C:\Program Files\Java\jdk1.8.0_65".

4. Now right click on My Computer and select Properties then click on Advanced or go to Control Panel > System > Advanced System Settings.

5. Click the Environment Variables button.

6. Hit New… button in System Variables section then type JAVA_HOME in Variable name field and give your JDK installation path in Variable value field.

a. If the path contains spaces, use the shortened path name, for example “C:\Progra~1\Java\jdk1.8.0_74” for Windows 64-bit systems

i. Progra~1 for 'Program Files'

ii. Progra~2 for 'Program Files(x86)'

7. It should look like:

8. Now click OK.

9. Search for Path variable in the “System Variable” section in “Environment Variables” dialogue box you just opened.

10. Edit the path and type “;%JAVA_HOME%\bin” at the end of the text already written there just like the image below:

11. To confirm your Java installation just open cmd and type “java –version”, you should be able to see version of the Java you just installed.

If your command prompt some what looks like the image above, you are good to go. Otherwsie you need to recheck whether your setup version is matching with the OS architecture (x86, x64) or if the environment variables path is correct or not.

B. .Net Framework 4

2. When prompted, accept the license terms and click the install button.

3. At the end, just click finish and it’s done.

C. Windows SDK 7

1. Now go to Uninstall Programs and Features windows from My Computer or Control Panel.

2. Uninstall all Microsoft Visual C++ Redistributables, if they got installed with the OS because they may be a newer version than the one which Win SDK 7.1 requires. During SDK installation they will cause errors.

3. Now open your downloaded Windows 7 SDK ISO file using 7zip and extract it to C:\WinSDK7 folder. You can also mount it as a virtual CD drive if you have that feature.

You will have following files in your SDK folder:

5. Now open your windows SDK folder and run setup.

6. Follow the instructions and install the SDK.

7. At the end when you get a window saying Set Help Library Manager, click cancel.

D. Maven

1. Now extract the downloaded Maven zip file to C drive.

2. For this tutorial we are using Maven-3.3.3.

3. Now open the Environment Variables panel just like we did during JDK installation to set M2_HOME.

4. Create a new entry in System Variables and give the name as M2_HOME and value as your Maven path before bin folder ex. C:\Maven-3.3.3. Just like the image below:

5. Now click OK.

6. Search for the Path variable in the “System Variable” section, click the edit button, and type “;%M2_HOME%\bin” at the end of the text already written there just like the image below:

7. To confirm your Maven installation just open cmd and type “mvn –v”, you should be able to see what version of Maven you just installed.

If your command prompt looks like the image above, you are done with Maven.

E. Protocol Buffer 2.5.0

1. Extract Protocol Buffer zip to C:\protoc-2.5.0-win32.

2. Now we need to add in the “Path” variable in the Environment System Variables section, just like the image below:

3. To check if the protocol buffer installation is working fine just type command “protoc --version”

Your command prompt should look like this.

F. Cygwin

1. Download Cygwin according to your OS architecture

a. 64 bit (setup-x86_64.exe)

b. 32 bit (setup-x86.exe)

2. Start Cygwin installation and choose "Install from Internet" when it asks to choose a download source, then click next.

3. Follow the instructions further and choose any Download site when prompted. If it fails, try any other site from the list and click next:

Building Hadoop

2. Inside that you will find "hadoop-2.7.2-src.tar", double click on that file.

3. Now you will be able to see Hadoop-2.7.2-src folder. Open that folder and you will be able to see the source code as shown here:

4. Now click Extract and give a short path like C:\hdp and click ok. If you give a long path you may get a runtime error due to Windows' maximum path length limitation.

5. Once the extraction is finished we need to add a new “Platform” System Variable. The values for the platform will be:

a. x64 (for 64-bit OS)

b. Win32 (for 32-bit OS)

Please note that the variable name Platform is case sensitive. So do not change lettercase.

6. To add Platform in System variable, just open the Environment variables dialogue box, click on the “New…” button in the System variable section, and fill the Name and Value text boxes as shown below:

7. So before we proceed on the build, just have a look on the state of all installed programs on my machine:

10. Change the directory to your extracted Hadoop source folder. For this tutorial its C:\hdp by typing command cd C:\hdp

11. Now type command mvn package -Pdist,native-win -DskipTests -Dtar

NOTE: You need a working internet connection as Maven will try to download all required dependencies from online repositories.

12. If everything goes smoothly it will take around 30 minutes. It depends upon your Internet connection and CPU speed.

13. If everything goes well you will see a success message like the below image. Your native Hadoop distribution will be created at C:\hdp\hadoop-dist\target\hadoop-2.7.2

14. Now open C:\hdp\hadoop-dist\target\hadoop-2.7.2. You will find “hadoop-2.7.2.tar.gz”. This Hadoop distribution contains native Windows binaries and can be used on a Windows OS for Hadoop clusters.

15. For running a Hadoop instance you need to change some configuration files like hadoop-env.cmd, core-site.xml,hdfs-site.xml, slaves, etc. For those changes please follow this official link to setup and run hadoop on windows: https://wiki.apache.org/hadoop/Hadoop2OnWindows.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.