org.apache.hadoop.mapreduce.lib.aggregate
Class ValueAggregatorJob

This is the main class for creating a map/reduce job using Aggregate
framework. The Aggregate is a specialization of map/reduce framework,
specializing for performing various simple aggregations.
Generally speaking, in order to implement an application using Map/Reduce
model, the developer is to implement Map and Reduce functions (and possibly
combine function). However, a lot of applications related to counting and
statistics computing have very similar characteristics. Aggregate abstracts
out the general patterns of these functions and implementing those patterns.
In particular, the package provides generic mapper/redducer/combiner
classes, and a set of built-in value aggregators, and a generic utility
class that helps user create map/reduce jobs using the generic class.
The built-in aggregators include:
sum over numeric values count the number of distinct values compute the
histogram of values compute the minimum, maximum, media,average, standard
deviation of numeric values
The developer using Aggregate will need only to provide a plugin class
conforming to the following interface:
public interface ValueAggregatorDescriptor { public ArrayList
generateKeyValPairs(Object key, Object value); public void
configure(Configuration conf); }
The package also provides a base class, ValueAggregatorBaseDescriptor,
implementing the above interface. The user can extend the base class and
implement generateKeyValPairs accordingly.
The primary work of generateKeyValPairs is to emit one or more key/value
pairs based on the input key/value pair. The key in an output key/value pair
encode two pieces of information: aggregation type and aggregation id. The
value will be aggregated onto the aggregation id according the aggregation
type.
This class offers a function to generate a map/reduce job using Aggregate
framework. The function takes the following parameters: input directory spec
input format (text or sequence file) output directory a file specifying the
user plugin class