Rate this:

Like this:

For some applications, it is undesirable to abort the job when few tasks fail, as it may be possible to use the results of the job despite of some failures.

In this case the maximum percentage of tasks that are allowed to fail without triggering job failure can be set for the job.

Map tasks are controlled by using mapred.max.map.failures.percent property. If we set this value as 50, map tasks will get finished even though 50% of tasks are killed without failing the job.

Reduce tasks are controlled by using property mapred.max.reduce.failures.percent property. If we set this value as 30, reduce tasks will get finished even though 30% of tasks are killed without failing the job.

What Is Apache Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

No class definition found exception in Hadoop while running mapreduce job:

Hadoop is used to process big data sets.

To process big data sets we need to write some maprereduce classes using hadoop api.

To process data in hadoop, it needs a job file which should contain mapreduce classes, dependent lib’s, and a hadoop cluster.

Even though we include all lib’s and successfully execute our class from command line without hodoop cluster, hadoop may throw no class definition found exception if we run in cluster.

Some tips to avoid this problem:

Never include mapreduce class files jar in library of job file if there is a dependency between your mapreduce class and other libraries in your lib. Instead of including your mapreduce class in library, keep mapreduce classes as it in your package directory structure in job.

If we make a jar of maprereduce class files and include it in lib of job file the problem we face is hadoop won’t load other dependent lib jar form job. The reason for this is since we do call only required mapreduce class from job it loads only jar which contain the mapreduce code. So other jar’s wont loaded while we running mapreduce job in hadoop cluster that will give us no class definition found exception even though we included all jar files required in lib of our job file.

Structure of hadoop job file and creating a job file:

For example lets consider job file name as Test.job

Extract content from Test.job in a temporary directory using jar command, and copy Test.job into temporary directory.

$ mkdir temporary

$ cd temporary

$ jar xf Test.job

$ ls

Output:

com lib

where com is package name structure which contain all required mapreduce classes, and lib should contain all library files required by the mapreduce classes.

Update job file after making any changes again with jar command as shown below