hadoop-common-user mailing list archives

Hi,
I am wondering is there any built-in function to automatically add a
self-increment line number in reducer output (like the relation DB
auto-key).
I have this problem because in 0.19.2 API, I used a variable linecount
increasing in the reducer like:
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text,IntWritable>{
private long linecount = 0;
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws
IOException {
//.....some code here
linecount ++;
output.collect(new Text(Long.toString(linecount)), var);
}
}
However, I found that this is not working in 0.20.2 API, if I write the
code like:
public static class Reduce extends
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>{
private long linecount = 0;
public void reduce (Text key, Iterator<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException,
InterruptedException {
//some code here
linecount ++;
context.write(new Text(Long.toString(linecount)),var);
}
}
but it seems not working anymore.
I would also like to know if there are combiner and reducer implemented,
how to avoid that line number being written twice (cause I only want it
in reducer, not in combiner). Thanks!
Shi