org.apache.pig.builtin
Class TOP

Top UDF accepts a bag of tuples and returns top-n tuples depending upon the
tuple field value of type long. Both n and field number needs to be provided
to the UDF. The UDF iterates through the input bag and just retains top-n
tuples by storing them in a priority queue of size n+1 where priority is the
long field. This is efficient as priority queue provides constant time - O(1)
removal of the least element and O(log n) time for heap restructuring. The
UDF is especially helpful for turning the nested grouping operation inside
out and retaining top-n in a nested group.
Assumes all tuples in the bag contain an element of the same type in the compared column.
Sample usage:
A = LOAD 'test.tsv' as (first: chararray, second: chararray);
B = GROUP A BY (first, second);
C = FOREACH B generate FLATTEN(group), COUNT(*) as count;
D = GROUP C BY first; // again group by first
topResults = FOREACH D {
result = Top(10, 2, C); // and retain top 10 occurrences of 'second' in first
GENERATE FLATTEN(result);
}

TOP

exec

This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.

getArgToFuncMapping

Allow a UDF to specify type specific implementations of itself. For example,
an implementation of arithmetic sum might have int and float implementations,
since integer arithmetic performs much better than floating point arithmetic. Pig's
typechecker will call this method and using the returned list plus the schema
of the function's input data, decide which implementation of the UDF to use.

A List containing FuncSpec objects representing the EvalFunc class
which can handle the inputs corresponding to the schema in the objects. Each
FuncSpec should be constructed with a schema that describes the input for that
implementation. For example, the sum function above would return two elements in its
list: