Towards building robust hadoop cluster

We have been working on using AWS to build our hadoop infrastructure. AWS is a pretty interesting cloud environment, and I do feel it is a great tool for any startup. However, we have been overwhelmed with numerous issues in getting our cluster running in a robust manner. A colleague (Swatz, and check out here blog) and I have identified some simple parameters that magically help produce a uber cluster:

1) Check your ulimits: Hadoop writes numerous files. The default linux kernel has a maximum open file limit of 1024 which is too low. We increased this to 65536 (nofile parameter in /etc/security/limits.conf). Check your maximum open file usage using:

2) Use ephemeral disks for storage, hadoop temporary files and HDFS. They are pretty fast and dont have the network lag of EBS.
3) Swap files are always good. We generally add 2-4 1GB swap files. Swap files are never a bad idea.
4) Don’t be scared if a couple of EC2 machines go down once in a while. Hardware failure are pretty common and can occur on amazon AWS as well. If it happens consistently, you need to check your configurations.
5) Use an instance store AMI. EBS is slow and has a network overhead that is in my opinion not worth it. The link works if you follow the instructions :).
6) Logs are written for a reason. Check hadoop tasktracker, datanode, namenode and jobtracker logs when stuff blows up. Check your instance log files to see if you configuration issues like incorrect partition mounting etc.
7) Use the right instance type on Amazon. Remember Hadoop needs high IO (network and hard drive) bandwitdhs, RAM and decent CPU. Rule of thumb: “First IOPS then FLOPS!”.
8) Don’t put too many mappers and reducers on a compute node. Try finding the sweet spot for your startup. We found 6 Mappers and 4 Reducers per node with each Mapper having 1200M of RAM and Reducers having 1600M of RAM works best for our needs.

Good rule of thumb is 4:3 per core

Using more reducers is not recommended. In general case,

= we usually choose the optimal number of reducers as the max no. of reduce slots or less (to catch up with failure nodes) , available in the cluster.
= which creates the fewest files possible
= Each task time between 5 and 15 minutes
So, overloading the number of reducers, will
– have bad performance because of network transfers with various InputFormats
– shuffle errors
– affects the workflow system, if you have dependent jobs in the next queue.(like Oozie)
– causes DDoS (distributed denial of Service) in the shuffle phase and will be
a problem for the next read in the pipeline.

We RAN the terasort algorithm to stress test our AWS cluster. First generate the input data for terasort (on a Terabyte)