Oracle Blog

Blog for ravee

Friday Jan 29, 2010

Problem: I have a laptop with opensolaris installed. I have a portable hard disk with FAT32 which I used to store media so that I can use it in Solaris/Linux/Windows. I want to store OS images (to run them as VMs using VirtualBox) which generally are >4G and FAT32 cannot have a single file exceeding this size.

One
way to get around this was to zip them which reduces the size
drastically to <=4G and hence be stored on the USB disk. This way I
would have to unzip it, save somewhere to use it, zip it
back when done and store the updated one. This was cumbersome and time consuming.

Recently, while running through the help output of VBoxManage, I noticed that VBoxManage's createhdtakes a --variant option which can take Split2Gas option!

Wednesday Jan 16, 2008

Sun Grid Engine is a DRM software with a flexible support for various parallel computing frameworks (Eg. MPI/PVM).

We could configure Grid Engine to provide a hadoop cluster environment where a map-reduce jobs can run. ( Hadoop is a framework for running applications on large clusters built of commodity hardware. )

Here we would try to setup a hadoop pe in SGE to run the jobs using hadoop-streaming.

Prerequisite: Refer hadoop wiki to setup a hadoop setup, just to make sure the setup works.

We create a hadoop pe for Sun Grid Engine that has:

DFS (HDFS) running on atleast 2 nodes (dfs.replication)

master and slaves are choosen by grid engine

master runs the NameNode, SecondaryNameNode,(DFS) JobTracker (MapRed)

slave runs DataNode (DFS), TaskTracker (MapRed)

each slave's TaskTracker runs (N/no.of slaves) tasks simultaneously which could be either of N mapper tasks and N reducer tasks on the whole (N is the no. of slots the user requests for on the qsub -pe )

The only place where we need to modify these are in $HADOOP_HOME/conf/hadoop-site.xml

<name>fs.default.name</name> <value>hdfs://master:54310/</value>

<name>mapred.job.tracker</name> <value>master:54311</value>

<name>mapred.map.tasks</name> <value>40</value>

<name>mapred.tasktracker.tasks.maximum</name> <value>2</value>

<name>hadoop.tmp.dir</name> <value>$TMP</value>

<name>dfs.replication</name> <value>2</value>

Apart from this, we need to add master and slaves in conf/masters and conf/slaves respectively, which SGE will help us.

The hadoop-pe-start.sh and hadoop-pe-stop.sh are responsible for start/stop of the hdfs and map-reduce daemons. The hadoop-pe-start.sh and hadoop-pe-stop.sh would look something like:

hadoop-pe-start.sh

NOTE: We depend on jps supplied by jdk to check if all the daemons are started, there could be a better way to acheive this!!This is required so as to make sure the pe-start exits only after the daemons are up and running, we don't want to start the jobs before the daemons start and start complaining about unavailability of daemons!!!

#!/bin/bash

JPS="$JAVA_HOME/bin/jps"## we get HADOOP_HOME from jobs envcd $HADOOP_HOME

### Create hadoop-site.xml### We add the following vars are in conf/hadoop-site-template.xml### so that it can be customized!#HADDOP_MASTER_HOST:HDFSPORT : HDFS host:port#HADDOP_MASTER_HOST:HMPRPORT : MapRed host:port#HMTASKS : No. of Map tasks#HRTASKS : No. of Reduce tasks#HTPN : Simulataneos tasks per slave#HTMPDIR : Hadoop temporary dir, we let hadoop use the SGE tmp dir

The error/output stream from the job can be seen in the job's error and output files (\*.[oe]JID), and output/error stream from the pe (hadoop daemons) can be seen in the job's pe error and output files (\*.p[oe]JID), typically in users home directory.(It would be better if we set the logs to the job's tmp dir and view it later!)

The map-reduce job has run using the Sun Grid Engine to setup the cluster. Further options in hadoop setup can be tuned using Grid Engine wrapping up the required arguments.

It is better to use the HDFS to run the jobs as we needn't share the whole of hadoop related files (for input to job) and is handled at the master.

NOTES:Here we are relying on master node to start othe daemons ( [rs]sh the machine and start daemons) and distribute jobs , and we donot have control on the TaskTracker threads. This way of setting a pe in Grid Engine is called loose-integration

With some more effort one could also achieve a tighter integration wherein the task of starting daemons and tasks on other slaves could be done by SGE. But this would require further understanding of Hadoop internals.

As far as my understanding goes, hadoop's TaskTracker spawns N threads where N is the mapred.tasktracker.tasks.maximum (set in hadoop-site.xml), though there might be more than tasks assigned for this slave node.Hence I am not sure how one could map the concept of a 'slot' in Grid Engine's perspective to a task in hadoop env.

If the user has requested n slots on a hadoop pe, the slots are provided as they are available on the exec hosts. Now pe_slots alloted per exec host is not same, which indicates that current job is entitled to run those no. of pe slots as alloted.

In the above example I used N slots (total pe slots for the job) / No.of slaves, which might not be the same as the slots alloted per exec host by SGE.

I assume we would have some way of getting around this, and probably many other tunables for hadoop which can be provided by SGE.

I look forward to see a Hadoop pe on Sun Grid wherein the map-reduce jobs could be directly scheduled to make use of the Hadoop setup available on the setup.