accumulo-user mailing list archives

Awesome! Wrote an Iterator that does the trick.
Thanks for the help.
Ed
From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
Reply-To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Date: Fri, 9 Mar 2012 16:55:10 -0800
To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Subject: Re: mutations in a combiner?
I have not memorized the Combiner interface, but you can do it with an Iterator, which is
just a bit more general than a Combiner.
-Eric
On Fri, Mar 9, 2012 at 3:48 PM, Seidl, Ed <seidl2@llnl.gov<mailto:seidl2@llnl.gov>>
wrote:
Thanks, I was afraid of that.
Sorry to be dense, but when you mention adding the A:needsReprocessing key…is that done
within a combiner, or by a separate task?
Thanks,
Ed
From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
Reply-To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Date: Fri, 9 Mar 2012 12:01:46 -0800
To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Subject: Re: mutations in a combiner?
In a word? No.
In the future, this sort of thing will be handled by coprocessors.
I've seen people do this by marking the fields, and then use a periodic map/reduce job to
reprocess:
rowA A:needsReprocessing = "CF:CQ:VIS"
rowA CF:CQ:VIS:T3 = "start end"
-Eric
On Fri, Mar 9, 2012 at 1:58 PM, Seidl, Ed <seidl2@llnl.gov<mailto:seidl2@llnl.gov>>
wrote:
I have a wacky question…is there any way to add data to a table from within a Combiner running
at compaction time? Here's what I'm trying to achieve…
Let's say I have a table that stores some type of data that needs to be processed in some
way (binary, xml, it doesn't matter). I may or may not receive all the data in one shot,
so as I populate the table, I do the processing (at least to the extent possible), and insert
a row with timestamp T1. Some time later, I get another chunk of data for a given row and
insert it. So now the row looks like
rowA CF:CQ:VIS:T1 = "start "
rowA CF:CQ:VIS:T2 ="end"
I can set up a combiner that will emit the value "start end", but now I want to re-process
that row. The easiest way I can think of to do this is to have the combiner create an entry
in a second table with the row id I just merged, then a separate process can consume rows
from the indicator table and do the necessary processing. Is this at all possible? Or should
I just move all the combining logic to an external process?
Thanks,
Ed Seidl