Is there a way to group Keys a second time before sending results to theReducer in the same job? I thought maybe a combiner would do this for me,but it just acts like a reducer, so I need an intermediate step that actslike another mapper instead.

To try to visualize this, how I want it to work:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, [{1, "John",""},{1, "",""},{1, "", "Doe"}]>

Reduce Output:

<1, "John","Doe">How it currently works:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, {1, "John",""}><1, {1, "",""}><1, {1, "", "Doe"}>

Reduce Output:

<1, "John","Doe"><1, "John","Doe"><1, "John","Doe">So, basically the issue is that even though the 2 in the first map recordshould really be a one, I still need to extract the value of "John" andhave it included in the output for key 1.

First rule to be wary of is your use of the combiner. The combiner *might*be run, it *might not* be run, and it *might be run multiple times*. Thecombiner is only for reducing the amount of data going to the reducer, andit will only be run *if and when* it's deemed likely to be useful by Hadoop.Don't use it for logic.

Although I didn't quite follow your example (it's not clear what your keysand values are), I think what you need to do is just run 2 map/reduce phaseshere. The first map/reduce phase groups the first set of keys you need, thenreduce, write it to disk (hdfs probably), and run a 2nd map/reduce phasethat reads that input and does the mapping you need. Most even modestlycomplex applications are going through multiple map/reduce phases toaccomplish their task. If you need 2 map phases, then the first reduce phasemight just be the identity reducer (org.apache.hadoop.mapreduce.Reducer),which just writes the results of the first map phase straight out.

Is there a way to group Keys a second time before sending results to theReducer in the same job? I thought maybe a combiner would do this for me,but it just acts like a reducer, so I need an intermediate step that actslike another mapper instead.

To try to visualize this, how I want it to work:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, [{1, "John",""},{1, "",""},{1, "", "Doe"}]>

Reduce Output:

<1, "John","Doe">

How it currently works:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, {1, "John",""}>

<1, {1, "",""}>

<1, {1, "", "Doe"}>

Reduce Output:

<1, "John","Doe">

<1, "John","Doe">

<1, "John","Doe">

So, basically the issue is that even though the 2 in the first map recordshould really be a one, I still need to extract the value of "John" and haveit included in the output for key 1.

Hope this makes sense.

Thanks in advance,

/* Joey */

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext